Pattern Identification in Time-Series Social Media Data, and Output-Dynamics Engineering for a Dynamic System Having One or More Multi-Scale Time-Series Data Sets
In some aspects, computer-implemented methods of identifying patterns in time-series social-media data. In an embodiment, the method includes applying a deep-learning methodology to the time-series social-media data at a plurality of temporal resolutions to identify patterns that may exist at and across ones of the temporal resolutions. A particular deep-learning methodology that can be used is a recursive convolutional Bayesian model (RCBM) utilizing a special convolutional operator. In some aspects, computer-implemented methods of engineering outcome-dynamics of a dynamic system. In an embodiment, the method includes training a generative model using one or more sets of time-series data and solving an optimization problem composed of a likelihood function of the generative model and a score function reflecting a utility of the dynamic system. A result of the solution is an influence indicator corresponding to intervention dynamics that can be applied to the dynamic system to influence outcome dynamics of the system.
This application is a divisional of U.S. Nonprovisional patent application Ser. No. 15/406,268, filed on Jan. 13, 2017, and titled “Pattern Identification in Time-Series Social Media Data, and Output-Dynamics Engineering for a Dynamic System Having One or More Multi-Scale Time-Series Data Sets”, which application claims the benefit of priority of U.S. Provisional Patent Application Ser. No. 62/388,074, filed Jan. 15, 2016, and titled “PATTERN IDENTIFICATION AND DATA-DRIVEN ENGINEERING OF SOCIAL DYNAMICS”. Each of these applications is incorporated by reference herein in its entirety.
GOVERNMENT RIGHTS CLAUSEThis invention was made with government support under CCF1314876 awarded by National Science Foundation. The government has certain rights in the invention.
FIELD OF THE INVENTIONThe present invention generally relates to the field of computer-implemented time-series data analysis and data-driven engineering. In particular, the present invention is directed to pattern identification in time-series social media data, and output-dynamics engineering for a dynamic system having one or more multi-scale time-series data sets.
BACKGROUNDAll activities in social networks evolve over time. Consequently, understanding the structures behind social dynamics represents a central question in social networks research, with many important applications including political campaigning, viral marketing, and disaster response. Many researchers have studied temporal patterns of social activities. These studies often cover various types of social dynamics, including the numbers of propagators and commentators, the breadth and depth of the propagation tree, the persistence of hashtags, and general graph statistics (e.g., the graph diameter).
Another line of research targets the systematic pattern discovery of social dynamics. Much of this work conducts pattern mining using distance-based clustering. For example, one group of researchers uses spectral clustering for one-dimensional dynamics. Also, an efficient mean-shift clustering algorithm has been proposed for multi-dimensional social dynamics. Other researchers use model-based methods to identify dynamics patterns. For example, another group of researchers uses a Gaussian Mixture model to analyze the proportions of readership before, at, and after the peak. Yet another group of researchers has proposed a deep-learning method that is capable of mining patterns of multiple time scales.
Many previous works are devoted to the modeling of social dynamics. Some of them are generative in nature and define a probability distribution of social dynamics. There are also predictive models, where a probability distribution can be indirectly defined, e.g., by introducing Gaussian noise.
Despite the large amount of research in the field of social dynamics, it remains desirable to improve the ability to identify structures behind social dynamics, both in terms of solution quality and computational efficiency. It also appears that no one has yet developed a rigorous methodology for engineering the outcome dynamics of a social system. A reason for this is that it is generally difficult to use a predictive model alone to solve engineering tasks; this is because, by definition, intervention is not considered in dynamics prediction, but is required in dynamics engineering.
SUMMARY OF THE DISCLOSUREIn one implementation, the present disclosure is directed to a computer-implemented method of determining patterns with a time-series social-media data set. The computer-implemented method includes receiving, by a social-media-data pattern-identification system, the time-series social-media data set; applying, by a social-media-data pattern-identification system, a deep-learning algorithm to the time-series social-media data set, wherein the deep-learning algorithm is designed and configured to analyze the time-series social-media data set for patterns across multiple time scales and to output pattern-identification data containing information on patterns in a plurality of the multiple time scales and across a plurality of the multiple time scales; and providing the output pattern-identification data to an output-interface of the social-media-data pattern-identification system.
In another implementation, the present disclosure is directed to a computer-implemented method of engineering outcome dynamics of a dynamic system that includes one or more multi-scale time-series data sets. The computer-implemented method includes training a generative model using each of the one or more multi-scale time-series data sets; providing an optimization problem composed of a likelihood function of the generative model and a score function that reflects a utility of the dynamic system; solving the optimization problem so as to determine an influence indicator indicating an influence scheme for influencing the outcome dynamics; and providing the influence indicator to an outcome-dynamics influencing system.
For the purpose of illustrating the invention, the drawings show aspects of one or more embodiments of the invention. However, it should be understood that the present invention is not limited to the precise arrangements and instrumentalities shown in the drawings, wherein:
In some aspects, the present invention is directed to computer-implemented methods of identifying compositional structures, or patterns, that underlie social dynamics present in D-dimensional time-series social media data from online social media. In this context, the term “online social media” may be construed broadly to mean any online forum to which users can provide content (e.g., post comments, post memes, post emojis, select selectors (such as a thumb's-up or thumb's-down selector), post images, post videos, etc., and any combination thereof). Examples of such online social media include, but are not limited to, social network platforms (e.g., the TWITTER®, SNAPCHAT®, FACEBOOK®, INSTAGRAM®, etc., social media platforms and review/comment platforms (e.g., apps such as YELP, FOURSQUARE, ZAGAT, OPENTABLE, etc., and review/comment sections of virtually any sort of online source, such as news sites, video sites (e.g., YOUTUBE, VIMEO, VID.ME, etc.), online shopping sites, blog sites, etc.), among others. Fundamentally, there is no limitation on the online source(s) of the D-dimensional time-series data.
Social media exhibits rich yet distinct temporal dynamics that cover a wide range of different scales; in other words, social-media data exhibits multi-scale compositionality. In order to study these complex dynamics, two fundamental questions revolve around (1) the signatures of social dynamics at different time scales, and (2) the way in which these signatures interact and form higher-level meanings. At a high level and as described in greater detail below, features of such methods involve constructing and applying a deep-learning framework to time-series social dynamics data. In some embodiments, the deep-learning framework uses specialized convolution operators that are designed to exploit the inherent heterogeneity of social dynamics.
In other aspects, the present invention is directed to computer-implemented methods of influencing, or engineering, outcome dynamics of a dynamic system having one or more multi-scale time-series data streams. In the context of the present disclosure, engineering outcome dynamics involves determining an intervention using a generative model learned from the one or more multi-scale time-series data streams and applying that intervention to the dynamic system so as to influence, or engineer, one or more aspects of the output dynamics of the dynamic system. As a brief example, consider the problem of how a company can best use the TWITTER® social-media platform to implement a marketing campaign for a new product. Here, the company has a range of options for the campaign, such as the number of “tweeters” to use, the frequency of “tweeting,” the content of the tweets, whether or not to use “retweeters,” the duration of the tweeting campaign etc., as well as a budget for the campaign. The question becomes: what is a combination of campaign options (intervention dynamics) to be applied to the TWITTER® social-media platform (dynamic system) that provides the optimal value (outcome dynamics) for the marketing campaign budget? Many other applications of outcome-dynamics engineering would be useful, and several are addressed below.
Pattern IdentificationAs mentioned above, some aspects of the present invention are directed to methods of identifying patterns in social-media data at differing time scales and across differing time scales, for example, to extract broader meanings from the patterns. Pattern identification and other types of analysis of social medial data is an increasingly important technology as social media platforms increasingly become primary platforms for social interactions ranging from commentary and discourse to marketing to politics and to many other types of social interactions. For example, pattern identification technologies can be applied to a wide variety of tasks including, but not limited to, monitoring, tracking, analysis, prediction, marketing, political promotion, and dynamics engineering, among others. Those skilled in the art will readily understand the importance of pattern-identification technology in the social media realm and will greatly appreciate the improvements a multi-scale pattern-identification methodology of the present disclosure brings to this technology. In view of its importance, dynamics engineering is addressed in great detail below.
Referring now to the drawings,
To aid in reader understanding of method 100 of
At step 110, social-media-data pattern-identification system 208 applies, here, via algorithm 216, a deep-learning algorithm to time-series social-media data set 204A. The deep-learning algorithm is designed and configured to analyze data in time-series social-media data set 204A for patterns across multiple time scales and to output, at step 115, pattern identification data containing information on patterns in a plurality of the multiple time scales and across a plurality of the multiple time scales. In one example, which is described in detail below, algorithm 216 performs a recursive convolutional Bayesian model (RCBM) for such pattern detection. The RCBM disclosed herein can be particularly useful because of it special convolutional operator, which is described in detail below, that tailors the gradient used in the learning process of the RCBM to the type of data at issue, i.e., time-series social-media data 204A.
In a particular example of an RCBM algorithm described below, a step may be to let the current temporal resolution be the finest that is relevant for the particular application at issue (e.g., seconds). Under the current temporal resolution, the RCBM algorithm iteratively learns the relevant patterns (and the time and strength they are activated, using updating rules, such as the updating rules in Algorithms 1 and 2 presented below. The temporal resolution may then be increased by max-pooling (i.e., taking the maximal value over consecutive values) activation strength vectors. Then, the RCBM algorithm may proceed to back to the iterative learning if the new resolution is still relevant to the application. Otherwise, the algorithm may terminate and output pattern-identification data 220 that includes the patterns identified in all relevant temporal resolutions. Further details of this example are presented below in the next section.
A deep-learning algorithm, such as the RCBM explicitly disclosed herein, of social-media-data pattern-identification system 208 can be used, for example, to improve anomaly detection and dynamics forecasting in online social media. For example and in the context of the RCBM disclosed herein, the patterns learned by the RCBM can be plugged into Equation 12 (see below) to measure the degree of anomaly of the social dynamics. The RCBM method can successfully detect abnormal behaviors in multiple temporal resolutions, such as urgent messages and service shutdown on the TWITTER® platform and adult entertainment and consistently-outstanding restaurants on the YELP® platform. These features cannot be detected by conventional methods, for example, that are based on term-frequencies. These examples are detailed in the tables of
In some embodiments, pattern-identification data 220 may be output to an output interface 224 of social-media-data pattern-identification system 208. In turn, output interface 224 may output pattern-identification information 228 to a monitoring/analysis/intervention system 232 that may perform any one or more of monitoring, analysis, and intervention functions, among other functions. Output interface 232 may have any suitable configuration and function, such as an application programming interface that essentially only passes-through pattern-identification data 220 as pattern-identification information 228 or as a more sophisticated interface that receives the pattern-identification data a raw data and operates on the data, such as to create graphs and other higher-level information as the patent-identification information. In this connection, monitoring/analysis/intervention system 232 may take any of a wide variety of forms, such as one or more display devices (e.g., video monitors), one or more display devices in combination with one or more human personnel charged with monitoring and/or analyzing pattern-identification information 228, and/or an outcome-dynamics-engineering system (e.g., outcome-dynamics-engineering system 2608 of
Introduction. All activities in social networks evolve over time. Consequently, understanding the structures behind social dynamics represents a central question in social networks research, with many important applications including, for example, political campaigning, viral marketing, and disaster response. While several recent works have investigated methods to identify patterns of social dynamics, this disclosure addresses a new, unexplored perspective of social dynamics, namely, multi-scale compositionality.
Studying multi-scale compositionality consists of identifying compositional structures of social media dynamics, which generally covers two tasks:
-
- T1. Identifying multi-scale signatures, which comprises identifying distinct signatures across a range of time scales, as opposed to sticking with a single one; and
- T2. Mining of compositional interactions, which requires discovering the interaction among multiple such signatures that produce higher-level meanings.
To illustrate these tasks, consider the case of human face recognition, wherein the first task includes recognizing the eyebrows, the cheeks, or the overall head shape. In contrast, the second task includes gauging the distance between the eyebrows, measuring the angle between the jaw and the ears, or recognizing the polygon formed by the lips, cheeks, and eyebrows. To recognize a human face, both tasks are equally important: one could make a mistake by either recognizing the wrong shape of an eyebrow, or by over/underestimating the distance between the eyebrows.
In the context of social dynamics, the same two tasks are found to be equally relevant. Indeed, social media exhibit distinct signatures at various time scales that range, for example, from seconds to days, whereas different combinations of such signatures can have totally different meanings and consequences. For example, an intense popularity of some keywords followed by a vibrant discussion may indicate a trendy event; however, the same popularity without any follow-up discussion can, on the contrary, indicate an internet scam. Clearly, being able to distinguish between the two cases can make a big difference.
This disclosure introduces a new model, namely, the RCBM, which is capable of addressing both tasks. An idea of RCBM is building a layered structure of signature detectors, wherein each layer is responsible for a specific time scale. Moreover, a higher-level layer is capable of detecting the interactions of various signatures (as they come from its immediate lower layer), and hence can identify compositional structures.
To the best of the present inventors' knowledge, this work brings at least the following new contributions:
-
- 1. Design and Analysis of RCBM: RCBM is a new deep-learning framework based on specialized convolution operators. While the formulation of RCBM is general enough to consider the heterogeneity of social signals, its runtime performance and solution quality are analyzed formally and confirmed experimentally. Of note, this is the first time when deep learning is used in the context of social dynamics.
- 2. Identifying Compositional Structures of Social Dynamics: In one exemplary experiment using RCBM, it is discovered that the social dynamics in a TWITTER® dataset are characterized by signatures representing the dynamics' popularity, contagiousness, stickiness, and interactivity. In another exemplary experiment using RCBM, the social dynamics in YELP® datasets are characterized by signatures representing how different groups of reviewers rate individual businesses. Further, the patterns where these signatures interact by generating, enhancing, or dominating one another can be found.
- 3. RCBM-Enabled Applications: New applications enabled by RCBM are exemplified, including the detection of abnormal social dynamics and the forecasting of social dynamics with features extracted using RCBM.
Problem Definition. A generic “information token” (e.g., a YOUTUBE® video, photo, hashtag, etc.) is used as the proxy for social dynamics. Since the social dynamics that emerge while an information token is being propagated across a social network can be characterized by multiple statistics (e.g., the ones mentioned above), X∈RD×T is used to represent the D-dimensional social dynamics corresponding to an information token (e.g., D=2 for the Xin
-
- A1. Finite Structures: the social dynamics can be characterized by a finite number of structures that are invariant to shifting in time and scaling in magnitude.
- A2. Burstiness: the distribution for the magnitude of the social dynamics is right-skewed; it is typically small but can be occasionally very large.
- A3. Heterogeneity: for each D-dimensional structure, all dimensions have different meanings and no one is an exact copy of another.
Model. In this embodiment, a convolutional Bayesian model (CBM) is used as the basis for the deep learning model. For the model, each social dynamic X is postulated as being generated by random activations of filters. For illustration, consider
Formally, given a set of K filters {Wk}k=1K our generation process for a social dynamic X is:
1. Sample {hk}k=1K such that hk[t]˜Exp(β)∀k,t
2. XγΣkWk⊗hk+ϵ where ϵ˜N(0,σ2). (1)
wherein Exp(⋅) and N(⋅) denote the Normal and Exponential distributions, respectively, with parameters β and σ. Also, ⊗ denotes a specialized convolutional operator that carries out the “scale-and-copy” task illustrated in
Effectively, ⊗ does D 1-D convolutions between each row of W and the entire h, and puts the results back to each row of the output matrix separately. Moreover, the above generation process implies a joint distribution P(X,h)=P (X|h)P(h) where:
CBM Features. The design of CBM closely reflects assumptions A1 to A3, noted above. To address assumption A1, a convolutional formulation is used such that the structures (i.e., the filters W's) are invariant to shifting in time and scaling in magnitude. To address assumption A2, burstiness is enforced by assuming that the magnitude of the activation vectors (i.e., h's) follows an exponential distribution, which is typically small but occasionally large. This will also enforce sparsity for activation vectors during model learning (see below). To address assumption A3, heterogeneity is considered using the specialized convolutional operator ⊗ noted above instead of the conventional matrix convolution. This provides provable advantages in both runtime and solution quality.
CMB Model Learning. Since given W and h, the Maximum Likelihood Estimators (MLE) for σ and β (in Equation 3) can be calculated in closed form, the main challenge for learning a CBM lies in estimating Win presence of the hidden variables hk's. Formally, the problem can be written as:
Assuming that P(W,h) peaks with respect to h, we obtain the approximation:
where ∥⋅∥F denotes the Frobenius norm. Now, considering a set of n samples of social dynamics {X(i)}ni=1 and their corresponding activation vectors {{h(i)k}Kk=1}ni=1, Equation 5 becomes:
In Equation 6, two additional constraints are incorporated to improve the solution quality of W. Specifically, the first constraint prevents Wk from blowing up, because otherwise the objective function can be trivially improved by scaling up (and down) Wk (and hk) by the same factor. Also, the second constraint helps ensure that the signs of Wk are not arbitrary and hence can be interpreted coherently. It is noted that Equation 6 is similar to sparse coding, with two important distinctions. First, the conventional matrix multiplication is used in sparse coding, whereas a convolutional formulation is used in Equation 6. Second, in sparse coding, the penalty strength (usually denoted as λ) needs to be tuned manually, whereas in Equation 6, the value of
can be assigned using MLE with a straightforward meaning.
To solve Equation 6, since the problem is convex with respect to each one of W and h (but not both), solving alternates between optimizing over one of them while keeping the other one fixed. To start with, the derivatives of the smooth part of the objective function (i.e., f1(W,h)=½∥Xi−ΣkWk⊗hk(i)∥F2) are first derived with respect to h and W:
∇f1(hk(i))={tilde over (W)}k⊙(Σjhj(i)⊗Wj−X)
∇f1(Wk)=Σihk{tilde over (()}i)⊗(Σjhj(i)⊗Wj−X). (7)
Here, the deconvolution operator ⊙ is defined as:
Again, the ⊙ operator differs from the conventional matrix convolution. Effectively, it calculates the 1-D convolutions of individual rows of W and X separately, and then adds them together to form a single row. This brings the same advantages as the ⊙ operator does, as mentioned above.
Stepsize Assignment. Typically, one can use line search to determine the stepsize in gradient-based methods. In the present case, however, doing so would slow down the optimization considerably because the line search itself needs many additional convolutions. Therefore, the following stepsize assignments for h and W, respectively, are derived:
wherein α∈(0,2). These stepsize assignments are essential to ensure good runtime and convergence properties.
Overall Algorithm. Algorithm 1, below, provides the pseudocode for CBM learning. Algorithm 1 takes as inputs a set of n sample social dynamics {X(i)}i=1n, the scale of the filters Tw, and the number of filters K, and produces as outputs all model parameters including {Wk[r]}k=1K, σ, and β. In each iteration of the main repeat loop of Algorithm 1, three tasks are executed in turn: Task 1 (the first for-loop) consists of solving Equation 6 with respect to h; Task 2 (the second loop) consists of advancing one step toward the solution of Equation 6 with respect to W; Task 3 (the reminder two lines) consists of calculating the MLE for σ and β. The details of Task 1 are presented in Algorithm 2, below. This is basically designed based on the Nestrov acceleration and the proximal method, where the function S+λ(⋅) is an element-wise function defined as:
Task 2 is conceptually similar to Task 1, where Π(⋅) is defined as:
One distinction is that instead of solving h until convergence as in Task 1, only a single update is conducted here. Task 3 calculates the close-form solution of MLE for σ and β. Since the whole algorithm can be viewed as a case of coordinate descent, it is guaranteed to converge.
Specifying Parameters. Algorithm 1 has two parameters, Tw and K, that need to be supplied by a user. The filter scale Tw can be conveniently specified as any small number (e.g., letting Tw≈D) without the need to worry about overlooking the structures at larger scales. This is because the high-level structures with larger scales are meant to be captured by the CBMs at higher levels.
Regarding the number of filters K, since CBM has a natural corresponding probabilistic model (i.e., P(X,h) according to Equation 3), a naive method is trying out a range of different K's and selecting the one that produces the highest Bayesian Information Criterion (BIC), where the latter is a standard metric for model selection. Doing so, however, is very expensive because it requires training a large number of CBMs. Therefore, the following three-step method is proposed for selecting K:
-
- 1. Pick a large K and train a CBM.
- 2. Sort all filters such that:
-
- 3. Plot the cumulative activation function F(m):
-
-
- and pick the new K as the position m* such that F(m*) starts to saturate (i.e., when dF/dm≤ε where 0<ε»1 is a small positive number).
-
The idea behind our method is that, since sparsity is enforced on hk's using the one-norm in Equation 6, the irrelevant filters {Wm+1*, . . . , WK} will all have very low activations compared to that of the relevant filters {W1, . . . , Wm*}. The advantage of this approach is that it requires training only one (instead of a large number of) CBM, and hence it is much more efficient.
RCBM: Recursive CBMs. To capture the compositional structure of social dynamics across different scales, RCBM, which is a hierarchical architecture constructed by stacking together multiple CBMs (as illustrated in
Suppose a CBM has been trained with K=3 following the procedures described above, like the Level 1 CBM in
where T+Tw−1 is the length of hk,1. Moreover, the values of X2 will be assigned as X2[d,t]=maxs∈{1, . . . , c}hd,1[c(t−1)+s].
After doing max-pooling for each sample, a set of level-2 dynamics (i.e., X2) is obtained for the whole dataset. These level-2 dynamics can then be used as if they are a set of new social dynamics and train another CBM as before, like the Level 2 CBM in
RCBM Features. While RCBM inherits all the features of CBM, it has two additional features that are reflected in its name. First, all levels of an RCBM share the same structure, hence the name “recursive”. This ensures that the numbers of activation vectors remain roughly the same across different levels. This is in sharp contrast to other convolutional deep architectures, where the number of activation vectors becomes K2 from the second level; this seriously limits the efficiency and scalability of previous algorithms. Second, by using Equation 3, the joint probability of the entire RCBM can be decomposed using Bayes' rule:
hence the name “Bayesian.” Moreover, it is noted that RCBM is normalized locally according to Equation 3. Therefore, the partition function Z in Equation 12 can be calculated efficiently using Equation 3 and the first line of Equation 12; this makes various inferences of RCBM efficient. Finally, such a probabilistic formulation also enables many new applications such as conditional inferences and anomaly detection.
Model Summary. To summarize, RCBM possesses three attractive properties:
-
- Good solution quality: under assumptions A1 to A3, RCBM is capable of identifying compositional structures of social dynamics that have provable convergence qualities. This is attributed to our specialized convolution operators (⊗ and ⊙) and stepsize assignment (Equation 9).
- Efficiency: the learning of RCBM is efficient and can scale much better than existing convolutional deep learning methods. This is attributed to our specialized convolution operators, stepsize assignment, and the recursive structure.
- Wide applicability: RCBM can be applied to a range of applications. In one example, it can be used as the feature extractor for supervised tasks. In another example, its probabilistic formulation (Equation 12) enables various conditional inferences and anomaly detection.
While all these properties are verified empirically, the first two are properly established in U.S. Provisional Patent Application Ser. No. 62/388,074, filed Jan. 15, 2016, and titled “PATTERN IDENTIFICATION AND DATA-DRIVEN ENGINEERING OF SOCIAL DYNAMICS,” which is incorporated by reference herein in its entirety and for its analyses of the convergence properties and runtime complexity of the RCBM implemented herein.
Experimental Results for RCBM ApplicationsExtensive experiments were conducted using RCBM and historical time-series social media data from the TWITTER® and YELP® platforms in the following three directions: (1) the evaluation of RCBM per se, (2) compositional structures in TWITTER® and YELP® datasets discovered using RCBM, and (3) two new applications enabled by RCBM.
Dataset DescriptionsTWITTER® Dataset. A Twitter dataset that consisted of 181M postings from 40.1M users and 1.4B following relationships was used. To enumerate the information tokens that carry social dynamics (as defined in herein), in contrast to a few previous authors who use hashtags, it was found that the discussion of many interesting events does not include a hashtag. Therefore, a more general definition using bursty keywords, i.e., keywords that attract intense attention during a short period of time was adopted. Common terms (e.g., “the”, “and”, etc.) were removed and classic method was used to detect bursty keywords. A total of 0.5M bursty keywords were detected where their corresponding social dynamics were extracted. For better representativeness, the dynamics with at least 5 per-min peak usages and 20 total usages around the 30 minutes during their peak times were selected, yielding a 13K-sample dataset of social dynamics.
Each social dynamic was characterized using seven features based on the types of users involved and certain graph statistics. For features based on the types of users involved, five types of users were considered. “Initiators” denoted the ones who used this keyword before any of his or her friends did. “First-time propagators” and “first-time commentators” denoted the users who retweeted and tweeted, respectively, about this keyword after his or her friends using the same keyword before. “Recurring propagators” and “recurring commentators” denoted the users who retweeted and tweeted, respectively, the same keyword that he or she himself or herself used before. For graph statistics, the evolving graph corresponding to each keyword's usages was built, and the graph's diameter and the size of the largest connected component LCC were used, as others have shown are informative.
YELP® Dataset. The YELP® dataset consisted of 1.1M reviews made by 252K users (with 956K friendship edges among them) during a ten-year period. The target of these reviews were 42K businesses in Las Vegas, Phoenix, Edinburgh, Madison, and Waterloo; each of these businesses was considered as an information token. For better representativeness, the businesses with at least 40 reviews (i.e. one review per season, on average) were selected, yielding a 5.3K dataset of social dynamics. Each social dynamic was characterized using six evolving statistics of a business: its numbers of reviews and tips, its average relative rating, the experience (measured by the number of previous reviews) and influence (measured by the number of friends) of the business's reviewers, and the amount of user responses (that tag each review as useful, funny, or cool). Similarly, these dimensions provided good interpretability to the compositional structures found in the YELP® dataset.
Evaluation of RCBMParametric Forms. The distributional assumptions made in Equation 3 were first verified. To this end, each of the two datasets was used to train a one-level CBM. For each sample X, the per-sample error ∥X−ΣkWk⊗hk∥F and the per-sample activation Σk∥hk∥1 were calculated. Then their empirical distributions were compared to their model distributions (i.e., according to Equation 3). From the results in
Runtime and Solution Quality. The runtime performance and the solution quality of RCBM was then evaluated against deep-learning and non-deep-learning methods. For the baseline deep-learning method, convolutional autoencoder (cAE) was used, as it represents the state-of-the-art convolutional deep learning algorithm. For the current method, two versions of RCBM were tested: one determines the stepsizes using line search (RCBM-LS); the other uses the proposed fixed stepsize in Equation 9 (RCBM-FS). Using each method, the sample sizes were varied in the range of 100 to 10000 and a two-level model with 10 filters was trained at each level. The solution quality of the learned models was measured using perplexity calculated using 3000 randomly sampled held-out test data. Intuitively, perplexity measures how closely the model distribution resembles the empirical distribution, where a lower value indicates a better model. All experiments were run using 10 repetitions, where both the means and the standard deviations were reported.
From the left panels of
For solution quality, it can be observed from the right panels of
To gain further insight, the current method (i.e., RCBM-FS) was compared against two non-deep-learning methods that also incorporate latent factors, i.e., state space model (SSM) and sparse coding (SC). For a fair comparison, SSM and SC were set up such that each of them had an equal or slightly larger number of parameters compared to that of RCBM-FS. Similar to
In terms of runtime (i.e., the left panels of
In terms of solution quality, (see the right panels of
Efficient Selection of K. Next, the naive and the current methods in selecting the best number of filters K were compared; both methods are described above. With each of the two datasets, two-level RCBM's were trained with both methods. For the naive method based on Bayesian Information Criterion (BIC), the BIC was calculated while fixing K=10 for one of K1 and K2 and varying the other; this required training 20 RCBM's in total. For the current method, only one RCBM was trained using K1=K2=10 while plotting the cumulative activation function F(m) in Equation 11 for both levels. The results are summarized in
The compositional structures of social dynamics were investigated by inspecting the learned filters (i.e., W's in Equation 12) in RCBM. It is first noted that this analysis is in sharp contrast with analyses presented by others. First, the goal in other analyses is finding representative samples, which is essentially clustering; the goal in the present case, on the other hand, is finding structures that best characterize social dynamics, which is essentially decomposition. Second, the current method is compositional and scale-free.
Compositional Structures in the TWITTER® Dataset. For the TWITTER® dataset, K1=K2=5 was used according to the experiment in
Level 1 Structures in the TWITTER® Dataset. The filter W1,1 in
Level 2 Structures in the TWITTER® Dataset. We now turn to investigate the level-2 filters as visualized in
The filter W2,1 (baseline is indicated by 15a) characterizes a three-stage structure that is driven mainly by popularity (the 15b), but accompanied by different structures in each of its stages. It is accompanied firstly by contagiousness (15c), secondly by interactivity (15e) and stickiness (15d), and thirdly by combinations of the three. The contagiousness dips in the second stage, but gets enhanced again in the third stage, suggesting that contagiousness alone is not enough to sustain long-lasting social dynamics. The filter W2,2 is mainly composed of strong contagiousness, which dips at around time t=12, and is later continued and enhanced by interactivity and stickiness. Manual inspection shows that the contagiousness results from reporting some facts before t=12, whereas it results from commenting about the facts, for example, from famous bloggers, after t=12. The filter W2,3 and W2,4 are also driven by contagiousness, but their corresponding contagiousness components have a smaller magnitude. The key difference between the two is that in W2,3, strong interactivity and stickiness are generated as a result of the initial contagiousness, which is much weaker in the case of W2,4. As a result, the dynamics with top 10% W2,3 activations reaches more than three times larger audiences compared to the case of the dynamics with top 10% W2,4 activations. The filter W2,5 exhibits a clear two-stage structure. The second stage characterized by contagiousness (15c) seems to result from the first stage that is characterized by strong stickiness. Manual inspection shows that such a structure consists of committed core users and responsive peripheral users, which is consistent with the leader-follower pattern reported by others. In the present method, however, the local structures of the pattern as well as the interaction among these structures are decomposed and analyzed in much greater detail.
To summarize, three representative ways were found where smaller-scale signatures interacted and formed larger-scale structures with higher-level meanings. First, it was found that popularity can stimulate interactivity, stickiness, and contagiousness (i.e. W2,1). Second, it was found that contagiousness can generate interactivity and stickiness, which, in turn, enhance contagiousness (e.g., W2,2 and W2,3). Third, it was found that stickiness beyond a certain threshold can generate contagiousness (e.g., W2,5).
Compositional Structures in YELP® Dataset. For the YELP® dataset, K1=6, K2=4 according to the experiment in
Level 1 Structures in the YELP® Dataset. Each level-1 structure indicates a particular level of rating (16c in
Level 2 Structures in the YELP® Dataset. The level-2 structures of YELP® social dynamics are summarized in
To summarize, representative ways were found where the ratings from the average and the elite YELP® reviewers can interact in different time scales. Particularly, three common long-term structures seem to have emerged: (1) low ratings by many average users, (2) high ratings by many elite users, and (3) sharp disagreement and transitions in the ratings between the average versus elite users.
Exemplary Applications of RCBMAnomaly Detection. An advantage of RCBM is its probabilistic formulation (i.e., Equation 12) that assigns a probability to every sample social dynamic. Therefore, a natural application is to detect abnormal social dynamics with extremely-low probabilities. A list of such anomalies detected in the TWITTER® dataset is summarized in the table of
Anomalies in the TWITTER® Dataset. The anomalies detected in the TWITTER® dataset (see the table of
When analyzing these anomalies, a legitimate question is whether these anomalies can be trivially detected by frequency-based rankings. It turns out that the list in the table of
Anomalies in the YELP® Dataset. The anomalies detected in the YELP® dataset (see the table of
Feature Extraction and Forecasting. When deep learning is used as the unsupervised feature extraction module in Computer Vision and Natural Language Processing, it produces state-of-the-art results in various supervised learning tasks. Similarly, RCBM's potential is explored herein for supervised learning in social applications. For the TWITTER® dataset, the total number of users of a hashtag was attempted to be forecast; for the YELP® dataset, the average daily checkins of a business during 2014 was attempted to be forecast.
For each dataset, a two-level RCBM was built using a training set. Then, for each testing sample, its activation vectors were obtained using Algorithm 1, above. To prevent the use of unavailable information during forecasting, for the TWITTER® dataset, all samples up to November 31 were used as the training set, and all samples in December as the testing set. Also, for each test sample, only the data up to its peak usage time was used. Similarly, for the YELP® dataset, the prediction of the 2014 average checkins were made based on the information up to the end of December 2013. For the prediction models, the Vector Auto-Regressive Moving Average (VARMA) and Support Vector Regression (SVR) were used as representative linear and nonlinear models, respectively. For features, the seven-dimensional features in
The results are summarized in the tables of
As mentioned above, some aspects of the present invention are directed to methods of engineering outcome dynamics of a dynamic system. As seen in
Throughout this disclosure and in the appended claims, observation dynamics are represented as a D-dimensional time series X∈RD×T
To aid in reader understanding of method 2500 of
When an image dictionary is available, it can comprise a collection of images of suspect items (e.g., guns, knives, etc.), indicia of suspect organizations (e.g., black flags), and photographs of people having criminal records. For example, a person having a criminal record may post tweets praising violence, and the tweet may contain a “selfie” (i.e., a photograph of that person taken by that person) and/or one or more tools for committing a crime. An image dictionary (again, non-time-series data) can be used to do a sparse representation for the posted images. The higher intensity, the more likely the posted images pose a threat to others. It is noted that occluded images (e.g., the person is wearing sunglasses making her eyes unseen) and corrupted images (e.g., poor quality (noisy) images) can be expressed as sparse linear combinations of images in the image dictionary, plus errors due to occlusion and corruption. If a posted image is not in the image dictionary, its sparse coding coefficients will be spread widely among multiple subjects in the image dictionary. If an image dictionary is not available, image recognition becomes more difficult, but it is still possible to identify objects. For example, using a suspect organization's black flag as an example, multiple images can be taken and averaged to obtain an obscure version of black flags (denoted as a prior image M), which is going to be low rank. Then, low rank approximation can be used to see if the tweeted image may be a black flag. These are but a few examples. Those skilled in the art will readily appreciate the wide variety of scenarios in which non-time-series data, such as non-time-series data 2624A, can be used for multidimensional pattern identification.
In some embodiments, dynamic system 2604 may also include one or more real-world social activities 2628, which can be virtually any activity in which at least one person interacts with one or more other people and/or influences one or more people by her/his actions. Examples of real-world social activities that the one or more real-world social activities 2628 can be are virtually limitless and include visiting a shopping mall or other venue, traveling by vehicle, participating in a parade, watching a sports match, etc., among many, many others. Reasons why dynamic system 2604 may optionally include one or more of each of non-social-media time-series data source 2620, non-time-series data source 2624, and real-world social activities 2628 will become apparent upon reading the rest of this description of outcome-dynamics-engineering scenario 2600.
Dynamic-engineering system 2608 includes one or more algorithms 2632 that perform at least method 2500 of
At step 2510 (
At step 2515 (
At step 2520 (
An output of outcome-dynamics-influencing system 2612 (
As noted above, methodologies disclosed herein can be used for applications having non-social-medial data sets. Two particular examples of other applications are human behavior modeling and macroeconomics pattern mining, which are described in detail in sections 6.1 and 6.2, respectively, in Huan-Kai Peng, “Understanding and Engineering Social Dynamics” PhD diss. Carnegie Mellon University, Pittsburgh, Jan. 31, 2016, which is incorporated herein by reference for these specific examples and related descriptions. Other exemplary applications include improving movement prototype mining, activity recognition, and anomaly detection when applied to sensor-based human behavior modeling. The inventors have also shown that the methodologies disclosed herein can be used to identify some meaningful macroeconomics patterns that may affect the GDP growth.
As those skilled in the art will readily appreciate, outcome-dynamics-engineering scenario 2600 (
The present inventors have determined that the dynamics-engineering problem described above relative to
Score Function. To define the dynamics engineering problem formally, let Y=[UV]∈RD×(T
Note that this concatenation is merely for mathematical convenience: U and V still differ in their meanings and in the kinds of properties their corresponding solutions (U* and V*) are desired to satisfy.
Moreover, let y=vec(Y) denote its vectorization (i.e., its transformation into a column vector):
Using the same example above, we have
Accordingly, the engineering problem can be reformulated as maximizing a score function defined as:
score(y)=yTBy+dTy. (14)
wherein B and d define the quadratic and linear parts of the score function, respectively. It is noted that this quadratic score function is general, in the sense that different goals can be achieved using various special cases.
Formal Dynamics-Engineering Problem. While maximizing the score function, two implicit assumptions are made: (1) there exists a temporal dependency among X and Y=[U V], and (2) the solution we come up with needs to follow that dependency. Using notation in the table of
Herein, P(⋅) denotes the log-likelihood using a probabilistic model that captures the temporal dependencies of the social dynamics. In other words, while the second term (i.e., the score function) takes care of the specific engineering task, the first term (i.e., log P(Y|X)) makes sure that the solution still conforms with the temporal dependency of the social dynamics. Moreover, λ≥0 is a balancing parameter that controls the relative importance of fitting the probability distribution P(⋅) versus maximizing the score. Of note, the selection of λ is important and is described below. It is noted that this proposed problem definition is general yet precise. Indeed, it can incorporate any combination of P(⋅) and score(Y) functions, in which any different combination corresponds to a different engineering task. Also, once this combination is given, the engineering problem is mathematically precise.
Exemplary Probabilistic ModelIn principle, any probabilistic model of social dynamics can be plugged into the likelihood term P(⋅) in Equation 15, above. However, the present inventors have found that a particular probabilistic model, namely the RCBM described above relative to pattern identification, is particularly effective.
According to RCBM, the basic generation process for dynamics X is:
More specifically, RCBM assumes that dynamics X (or more generally, the concatenation of [X U V]) are generated from making “scaled copies” of the filter matrices Wk's, where the time shift and the scaling of these copies are determined by the sparse activation vectors hk's. Such a “scale-and-copy” operation is carried out using the operator ⊗ in Equation 16, above, which denotes a dimension-wise convolution defined as:
It is noted that this operator differs from the conventional matrix convolution. Effectively, ⊗ does D 1-D convolutions between each row of W and the entire h, and puts back the results to each row of the output matrix separately.
By stacking multiple levels of the basic form in Equation 16, a deep-learning architecture is constructed:
A key of this construction is building the upper-level dynamics Xi by max-pooling the lower-level activation vectors hl−1,k. This essentially takes the maximum value over c consecutive values of the lower-level activation vectors. This operation introduces non-linearity into the model, which is key for the superior performance of deep-learning methods.
Exemplary Score FunctionsPattern matching: To achieve an ideal outcome Vref while minimizing the cost associated with the required intervention U, one can maximize the following score function:
The first term denotes mismatch and will force V to match Vref, the second term denotes cost and will typically force values in U to be small. Here ρ∈[0, 1] controls the relative importance of mismatch versus cost. Moreover, Ccost encodes the relative expense of controlling different features at different times, whereas (U,C)=ΣijUijCij denotes the dot product between U and C. For example, suppose Ccost=[21 21] and U=[31 42], then (Ccost,U)=1×1+1×2+2×3+2×4=17. Returning to the TWITTER® example above, suppose that the first row of U=[31 42] represents the numbers of propagators (i.e., one propagator at t=1 and two propagators at t=2) and that the second row represents the numbers of commentators (i.e., three commentators at t=1 and four at t=2), then assigning Ccost=[21 21] is equivalent to specifying that it is twice as expensive to grow the number of commentators (TWITTER® users who spend time to leave comments) than to control the number of propagators (who simply click “retweet”), regardless of time. Finally, we note that Equation 19 is a special case of Equation 14. To check this, the second line of Equation 19 can be rewritten using B=(1−ρ)ÎvTÎv, d=vec([−ρCu 2(1−ρ)Vref]), and Îv=ID⊙([0T
Profit maximization: To maximize the reward associated with the outcome V while minimizing the cost associated with the intervention U, one can maximize the following score function:
The first term denotes cost and will typically force the values in U to be small; the second term denotes reward and will typically force the values in V to be large. Similarly to the above task, Ccost is used to encode the relative cost and use Creward to encode the relative reward of different dimensions and time. Following the above TWITTER® example, assigning
is equivalent to specifying that it is three times more rewarding to acquire a user (either a propagator or a commentator) at t=2 compared to acquiring a user at t=1, regardless of the type of the user. Like the case of Equation 19, ρ controls the relative importance of cost versus reward. It is noted that Equation 20 is another special case of Equation 14. To check this, the second line of Equation 20 can be rewritten using d=vec([−ρCcost(1−ρ)Creward]).
RCBM Formulation of Exemplary Dynamics-Engineering ProblemBy writing down the conditional probability P(Y|X) using the joint probability specified in Equation 18 and then plugging P(Y|X) into the first term of Equation 15, the optimization problem in Equation 15 can be explicitly written as:
Here, a two-level RCBM is presented for illustration purposes, though the optimization formulation for a multilevel RCBM can be similarly derived. The max-pooling operation MP(⋅) is defined as:
wherein h1 is the vector concatenation of {hik}k=1K
To solve the difficulty resulting from max-pooling, the following convex relaxation may be used:
The idea behind this relaxation consists of introducing a new variable S as the surrogate of MP(⋅). Furthermore, the equality constraints specified in Equation 22 may be substituted with two sets of inequality constraints, i.e.,
In other words, instead of forcing S to equate
i.e., the maximal value among the consecutive c values, we now constrain it to be larger than or equal to the maximal value, but smaller than or equal to the sum of those c values.
It is noted that the problem in Equation 23 is now jointly convex in Y, h1, h2 and S, since the objective function is convex and all constraints are linear. Moreover, since the objective is differentiable, a possible approach to solve Equation 23 is using the proximal method. It turns out, however, that the projection functions corresponding to the constraints in Equation 23, which are required in the proximal method, are difficult to derive.
The issue may be solved by noting that the objective function of Equation 23 is quadratic with only linear constraints. Therefore, in principle, there exists a quadratic programming (QP) transformation that is equivalent to Equation 23. The explicit form and the mathematical details of this QP transformation is described in the S1 Appendix of Peng et al., “Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization,” http://dx.doi.org/10.1371/journal.pone.0146490, PLOS, (Jan. 15, 2016) that is incorporated herein by reference. It is noted that, since the problem is jointly convex, QP is guaranteed to find an approximate solution in polynomial time. In our experiments, the QP has around 1000 variables and the problem gets solved in just a few seconds.
Data-Driven EvaluationFor many methods in modeling and prediction, cross-validation is a standard way for evaluating solutions and selecting parameters. However, cross-validation cannot be directly applied to the current dynamics engineering problem, because the properties of a “good solution” for modeling and prediction is well-known. For example, a good modeling solution will have high data likelihood and a good prediction solution will be highly accurate. For the current dynamics-engineering problem, however, such a property is less obvious.
For the current dynamics engineering problem, it is asserted that a key property of a good solution may consist of combining a high score and a high validity, where the latter can be roughly defined as how well the solution is supported by historical samples that achieve high scores. To show that having a high score alone is not sufficient, consider the case when λ→∞ in Equation 15. In this case, the optimization will produce the highest possible score, while completely ignoring the likelihood term in Equation 15. As a result, the optimization will produce a solution that does not possess any inherent temporal dependency of the data. In this case, the projected outcome V* would be unlikely to happen in the real world even if the suggested intervention U* is implemented.
Validity. As mentioned above, the informal definition of validity is how well the solution is supported by historical samples that achieve high scores. To formally define validity γ, two important components are: (1)P that denotes the density function capturing what the high-scoring dynamics look like in historical data, and (2) q0 that denotes a carefully chosen threshold. More precisely, {circumflex over (P)}(⋅) and q0 may be constructed in four steps:
-
- 1. Evaluate the value of the score function using all historical samples {[Xi,Yi]}i=1N, rank them according to their evaluated values, and then keep only the No top-scoring samples.
- 2. Use the first half
-
- to construct a kernel density estimator:
-
- 3. Use the second half
-
- to choose the value of co that has the highest data likelihood.
- 4. Use the second half to calculate q0, such that only a small fraction (e.g., 5%) of samples among the second half has {circumflex over (P)}(X, Y; h)<q0.
With {circumflex over (P)}(⋅) and q0 defined, we can define the validity γ corresponding to a solution Y*(λ) (i.e., solution of Equation 15 using a specific value of λ) as:
Then, γ can be used as a convenient measure, such that γ≥0 indicates that, according to historical high-scoring data, the solution is “realistic enough.”
A main idea behind the above procedure is to construct {circumflex over (P)}(X,Y) as a density estimator of the high-scoring historical samples, and then construct q0 as the quantile estimator (e.g., at 5%) for the empirical distribution of {Pi}i. Here Pi={circumflex over (P)} (Xi, Yi) denotes the value of {circumflex over (P)}(⋅) evaluated using Xi and Yi. Therefore, when the density of a solution {circumflex over (P)}([X,Y* (λ)]) is larger than q0 (i.e., when γ≥0 in Equation 24), we call this solution as being “realistic enough,” because it is more likely (i.e., more realistic) than the 5% most-unlikely high-scoring historical samples.
It is noted that in the construction of γ, and in particular, {circumflex over (P)}(⋅) and q0, the entire training set is not used. The underlying reason is that a realistically good solution can be very rare. In other words, it is by design that validity should measure how well a solution is supported by historical samples that achieve high scores, instead of historical samples in general. Consequently, in principle, No should be selected as a small fraction (e.g, 10%) of the size of the historical samples.
It is also noted that {circumflex over (P)} (X,Y) and q0 depend on the partitioning in the second and the third steps, which, according to Equation 24, can also affect γ(λ). A simple way to remove such a dependency is to use multiple random partitionings, obtain the corresponding copies of {circumflex over (P)} (X,Y)'s and q0's, and then calculate the average value of Equation 24 using all these copies.
Selection of λ. With validity defined, λ may be selected. As mentioned above, λ should be the combination of high validity and high score. A key observation from Equation 15 is that one can make the score larger by making λ larger. Therefore, while there may be many potential ways to do it, we propose the following method:
where the idea is that conditioned on the solution being (sufficiently) valid, its score is desired to be as high as possible. A TWITTER® dataset is used to demonstrate the interplay among k, validity (γ), and score while engineering social dynamics in the next section.
Experimental Results for RCBM-Based Outcome Dynamics EngineeringFor experimental results, the dataset, the overall setup, and two baseline methods are first described. Then, experimental results are presented on two engineering tasks: pattern matching and profit maximization.
Dataset DescriptionThe dataset used in the experiments was a TWITTER® dataset that consisted of 181M postings from 40.1M users and 1.4B following relationships. With this dataset, hashtags were used to enumerate the information tokens that carry social dynamics. “Low-traffic” hashtags were filtered-out by selecting only the ones with at least 100 total usages around the 90 minutes during their peak times, yielding a 10K-sample dataset of social dynamics. These samples were then sorted according to their peak time. The first 9K samples were used as the training set, i.e., for model training and the construction of {circumflex over (P)}(⋅) and q0 (mentioned above), whereas the remaining 1K samples were reserved for testing. This data partitioning scenario ensured that all training data occurs prior to testing data, i.e., no “future data” was used while testing. For all hashtag samples, the dynamics were measured in units of 3 minutes, where the first 30 minutes were the observation dynamics (X), the middle 30 minutes were the intervention dynamics (U), and the last 30 minutes were the outcome dynamics (V).
Each social dynamic was characterized using its five types of users. “Initiators” denote the users who used this keyword before any of his or her friends did. “First-time propagators” and “first-time commentators” denote the users who retweeted and tweeted, respectively, about this keyword after his or her friends using the same keyword before. “Recurring propagators” and “recurring commentators” denote the users who retweeted and tweeted, respectively, the same keyword that they themselves used before. Of note, it means that X, U, V∈R5×10 because now each variable has five features and ten timesteps (i.e., three minutes per timestep).
SetupExperiments were conducted on two types of engineering tasks, namely, by solving Equation 15 using two distinct score functions: the one in Equation 19 for pattern matching and the one in Equation 20 for profit maximization. For pattern matching, we set Ccost=1D×Tu in Equation 19 to assume a uniform intervention cost in time and for different types of users. Similarly, for profit maximization, Ccost=1D×Tu and Creward=1D×Tv in Equation 20 were set. Of note, the assignment of Vref in Equation 19 depends on the particular experiment and is detailed below.
In order to analyze the interplay and tradeoffs critical to real-world engineering applications, for each task, analyses were conducted along the following four directions:
-
- 1. Interplay between validity γ (Equation 24) and the optimization parameter λ (Equation 15).
- 2. Tradeoff of individual terms in the score functions. In particular, for pattern matching (Equation 19), it includes cost (<Ccost,U>) and mismatch (∥V−Vref∥f2); for profit maximization (Equation 20), it includes cost (<Ccost,U>) and reward (<Creward,V>).
- 3. Comparison between “real” vs. engineered cases. A motivation behind this analysis is to quantify the potential benefits as a result of purposeful engineering, compared to what happened in reality.
- 4. A case study.
AR. A first baseline is to substitute the likelihood term in Equation 15 with another one using the Autoregressive model (AR). AR is commonly used in time-series forecasting and is defined as:
Here xt∈RD×1 denotes the multivariate features at time t; ϵt˜(0, Σ) denotes the independent and identically distributed multivariate Gaussian noise; Φi's denote the matrices for modeling the dependency between the current dynamics and its history back top steps, where we set p=10. Details of solving Equation 15 with the first term using AR is given in the S1 Appendix noted and incorporated herein by reference above. While this baseline fits perfectly in our proposed framework, its restrictive linear generative model may limit its performance.
NN. A second baseline is based on a nearest-neighbor (NN) search. The idea is to search within the training data for the top 5% samples that are the most similar to the given observation X (using Euclidean distance). Then the solution Y* is obtained using the {U, V} part of the highest-scoring sample within that subset. The advantage of NN is that, unlike other methods, it doesn't have a concern about validity, i.e., whether the solution is realistic or not, because the solution is generated from real dynamics that happened in the past. However, its disadvantage is that not all historical dynamics matches the observation X and maximizes the score function at the same time. Consequently, the score of an NN solution may be low or unstable.
Experiment 1: Pattern MatchingIn a first experiment, i.e., pattern matching, the observation Xi of every test sample was given and the aim was at matching a single Vref. This Vref is defined as the average outcome dynamics of the top 2% samples in the training set with the highest long-term popularity ∥V∥1. Dynamics engineering was conducted using all test samples and the resulting validity, cost, and mismatch were analyzed.
Validity (γ) vs. λ. In
Cost-Mismatch Tradeoff. In
Constrained Cost Minimization. In order to demonstrate the potential benefits of purposeful engineering, a slightly different setting was used. While for each test sample i, the observation part λi was still given, we set Vref=Vi, i.e, its own outcome dynamics. This setting allowed us to compare the performance of the matching algorithms, in terms of cost, with what actually happened in reality, assuming that each test sample was actually performing a (perfect) matching task without consciously considering minimizing the cost.
Since the real case achieves a “perfect match,” the engineering algorithms needed to be constrained such that they can be compared on the same footing. Therefore, an additional constraint ∥V*−Vref∥1≤pDTv, wherein p=5%, was enforced. In other words, after going through every test sample, each algorithm had its own fraction of valid answers, and only the valid answers were compared to the same set of samples, in terms of cost, to the real case. For AR and the current method, a valid answer needed to satisfy this constraint on top of satisfying γ≥0.
Case Study. To gain further insights, a test case was picked wherein all three methods produced valid solutions from the experiment of
The solution produced by NN, although it seems to match the real case in its general shape, produces a moderate mismatch. Further, the cost of its suggested intervention is the highest among the three. AR, on the other hand, produces a very smooth dynamics that does not match the general shape of (the third part of) the real case, although the mismatch is quantitatively comparable to that of NN. Moreover, although its cost is relatively low, the dynamics do not look real: in fact, its solution validity γ is 0.02, i.e., barely passes 0.
The current method produces a recommendation that best matches the third part of the real case, while also producing the lowest-cost intervention. A closer inspection shows that although the magnitude of the intervention dynamics (i.e., the second part) is generally low, it seems to consciously keep an interesting proportion and interaction among different types of users, i.e., initiators and first-time propagators around t=50, 1st-time commentators around t=55, and recurring commentators around t=65. This shows that the key features for successful dynamics engineering are not necessarily unique and may involve the interaction of multiple features. This is made possible because the proposed recommendation explicitly uses the patterns (i.e., the filters W's in Equations 21 and 18) at different temporal scales that are learned directly from data. Consequently, the current method is able to recommend low-cost, good-matching solutions while still making the suggested dynamics follows the intrinsic temporal dependencies from the data.
Experiment 2: Profit MaximizationIn the second experiment, i.e., profit maximization, the observation λ of every test sample was given, and the aim was at maximizing the long-term popularity (reward) ∥V∥1 with minimum cost ∥U∥1. Again, dynamics engineering was conducted using all test samples and the resulting validity, cost, and reward were analyzed.
Validity (γ) vs. λ.
Interestingly, there are also three observations in
Cost-Mismatch Tradeoff. In
Interestingly, there are also two observations in
Constrained Reward Minimization. To demonstrate the potential benefits of purposeful engineering, a slightly different setting is now used. While for each test sample i, the observation part Xi is still given, an additional constraint is enforced that a solution must produce a cost that is at most half of the actual cost of sample i, i.e., ∥Ui∥1, on top of achieving γ≥0, to be considered a valid answer. This setting allows comparison of the performance of the profit-maximization algorithms, in terms of reward and cost, against what actually happened in reality.
Case Study. To gain further insights, a test case was picked where all three methods produce valid solutions from the experiment of
From
The merits of the current pattern matching and profit maximization are quite different. Indeed, from
Further, such a difference in formulation implies a difference in the fundamental difficulties of two tasks. More importantly, profit maximization is significantly more difficult because while the “cost” has a natural lower bound (i.e., zero), the “reward,” in principle, does not have any upper bound. In other words, unless the parameter λ is assigned perfectly, it is very easy to either obtain an invalid solution or a low-score solution. Therefore, in many engineering cases (e.g., marketing promotion), although profit maximization may be more desirable, in practice, pattern matching may be more useful.
The above analysis is confirmed by the experimental results in two ways. First, by comparing
These differences among the two tasks have practical implications on their real applications. First, if the ideal outcome pattern is given, pattern matching is the better option because according to
It is to be noted that any one or more of the aspects and embodiments described herein may be conveniently implemented using one or more machines (e.g., one or more computing devices that are utilized as a user computing device for an electronic document, one or more server devices, such as a document server, etc.) programmed according to the teachings of the present specification, as will be apparent to those of ordinary skill in the computer art. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those of ordinary skill in the software art. Aspects and implementations discussed above employing software and/or software modules may also include appropriate hardware for assisting in the implementation of the machine executable instructions of the software and/or software module.
Such software may be a computer program product that employs a machine-readable storage medium. A machine-readable storage medium may be any medium that is capable of storing and/or encoding a sequence of instructions for execution by a machine (e.g., a computing device) and that causes the machine to perform any one of the methodologies and/or embodiments described herein. Examples of a machine-readable storage medium include, but are not limited to, a magnetic disk, an optical disc (e.g., CD, CD-R, DVD, DVD-R, etc.), a magneto-optical disk, a read-only memory “ROM” device, a random access memory “RAM” device, a magnetic card, an optical card, a solid-state memory device, an EPROM, an EEPROM, and any combinations thereof. A machine-readable medium, as used herein, is intended to include a single medium as well as a collection of physically separate media, such as, for example, a collection of compact discs or one or more hard disk drives in combination with a computer memory. As used herein, a machine-readable storage medium does not include transitory forms of signal transmission.
Such software may also include information (e.g., data) carried as a data signal on a data carrier, such as a carrier wave. For example, machine-executable information may be included as a data-carrying signal embodied in a data carrier in which the signal encodes a sequence of instruction, or portion thereof, for execution by a machine (e.g., a computing device) and any related information (e.g., data structures and data) that causes the machine to perform any one of the methodologies and/or embodiments described herein.
Examples of a computing device include, but are not limited to, an electronic book reading device, a computer workstation, a terminal computer, a server computer, a handheld device (e.g., a tablet computer, a smartphone, etc.), a web appliance, a network router, a network switch, a network bridge, any machine capable of executing a sequence of instructions that specify an action to be taken by that machine, and any combinations thereof. In one example, a computing device may include and/or be included in a kiosk.
Memory 3508 may include various components (e.g., machine-readable media) including, but not limited to, a random access memory component, a read only component, and any combinations thereof. In one example, a basic input/output system 3516 (BIOS), including basic routines that help to transfer information between elements within computer system 3500, such as during start-up, may be stored in memory 3508. Memory 3508 may also include (e.g., stored on one or more machine-readable media) instructions (e.g., software) 3520 embodying any one or more of the aspects and/or methodologies of the present disclosure. In another example, memory 3508 may further include any number of program modules including, but not limited to, an operating system, one or more application programs, other program modules, program data, and any combinations thereof.
Computer system 3500 may also include a storage device 3524. Examples of a storage device (e.g., storage device 3524) include, but are not limited to, a hard disk drive, a magnetic disk drive, an optical disc drive in combination with an optical medium, a solid-state memory device, and any combinations thereof. Storage device 3524 may be connected to bus 3512 by an appropriate interface (not shown). Example interfaces include, but are not limited to, SCSI, advanced technology attachment (ATA), serial ATA, universal serial bus (USB), IEEE 1394 (FIREWIRE), and any combinations thereof. In one example, storage device 3524 (or one or more components thereof) may be removably interfaced with computer system 3500 (e.g., via an external port connector (not shown)). Particularly, storage device 3524 and an associated machine-readable medium 3528 may provide nonvolatile and/or volatile storage of machine-readable instructions, data structures, program modules, and/or other data for computer system 3500. In one example, software 3520 may reside, completely or partially, within machine-readable medium 3528. In another example, software 3520 may reside, completely or partially, within processor 3504.
Computer system 3500 may also include an input device 3532. In one example, a user of computer system 3500 may enter commands and/or other information into computer system 3500 via input device 3532. Examples of an input device 3532 include, but are not limited to, an alpha-numeric input device (e.g., a keyboard), a pointing device, a joystick, a gamepad, an audio input device (e.g., a microphone, a voice response system, etc.), a cursor control device (e.g., a mouse), a touchpad, an optical scanner, a video capture device (e.g., a still camera, a video camera), a touchscreen, and any combinations thereof. Input device 3532 may be interfaced to bus 3512 via any of a variety of interfaces (not shown) including, but not limited to, a serial interface, a parallel interface, a game port, a USB interface, a FIREWIRE interface, a direct interface to bus 3512, and any combinations thereof. Input device 3532 may include a touch screen interface that may be a part of or separate from display 3536, discussed further below. Input device 3532 may be utilized as a user selection device for selecting one or more graphical representations in a graphical interface as described above.
A user may also input commands and/or other information to computer system 3500 via storage device 3524 (e.g., a removable disk drive, a flash drive, etc.) and/or network interface device 3540. A network interface device, such as network interface device 3540, may be utilized for connecting computer system 3500 to one or more of a variety of networks, such as network 3544, and one or more remote devices 3548 connected thereto. Examples of a network interface device include, but are not limited to, a network interface card (e.g., a mobile network interface card, a LAN card), a modem, and any combination thereof. Examples of a network include, but are not limited to, a wide area network (e.g., the Internet, an enterprise network), a local area network (e.g., a network associated with an office, a building, a campus or other relatively small geographic space), a telephone network, a data network associated with a telephone/voice provider (e.g., a mobile communications provider data and/or voice network), a direct connection between two computing devices, and any combinations thereof. A network, such as network 3544, may employ a wired and/or a wireless mode of communication. In general, any network topology may be used. Information (e.g., data, software 3520, etc.) may be communicated to and/or from computer system 3500 via network interface device 3540.
Computer system 3500 may further include a video display adapter 3552 for communicating a displayable image to a display device, such as display device 3536. Examples of a display device include, but are not limited to, a liquid crystal display (LCD), a cathode ray tube (CRT), a plasma display, a light emitting diode (LED) display, and any combinations thereof. Display adapter 3552 and display device 3536 may be utilized in combination with processor 3504 to provide graphical representations of aspects of the present disclosure. In addition to a display device, computer system 3500 may include one or more other peripheral output devices including, but not limited to, an audio speaker, a printer, and any combinations thereof. Such peripheral output devices may be connected to bus 3512 via a peripheral interface 3556. Examples of a peripheral interface include, but are not limited to, a serial port, a USB connection, a FIREWIRE connection, a parallel connection, and any combinations thereof.
The foregoing has been a detailed description of illustrative embodiments of the invention. It is noted that in the present specification and claims appended hereto, conjunctive language such as is used in the phrases “at least one of λ Y and Z” and “one or more of λ Y, and Z,” unless specifically stated or indicated otherwise, shall be taken to mean that each item in the conjunctive list can be present in any number exclusive of every other item in the list or in any number in combination with any or all other item(s) in the conjunctive list, each of which may also be present in any number. Applying this general rule, the conjunctive phrases in the foregoing examples in which the conjunctive list consists of λ Y, and Z shall each encompass: one or more of X; one or more of Y; one or more of Z; one or more of X and one or more of Y; one or more of Y and one or more of Z; one or more of X and one or more of Z; and one or more of λ one or more of Y and one or more of Z.
Various modifications and additions can be made without departing from the spirit and scope of this invention. Features of each of the various embodiments described above may be combined with features of other described embodiments as appropriate in order to provide a multiplicity of feature combinations in associated new embodiments. Furthermore, while the foregoing describes a number of separate embodiments, what has been described herein is merely illustrative of the application of the principles of the present invention. Additionally, although particular methods herein may be illustrated and/or described as being performed in a specific order, the ordering is highly variable within ordinary skill to achieve aspects of the present disclosure. Accordingly, this description is meant to be taken only by way of example, and not to otherwise limit the scope of this invention.
Exemplary embodiments have been disclosed above and illustrated in the accompanying drawings. It will be understood by those skilled in the art that various changes, omissions and additions may be made to that which is specifically disclosed herein without departing from the spirit and scope of the present invention.
Claims
1. A computer-implemented method of engineering outcome dynamics of a dynamic system that includes one or more multi-scale time-series data sets, the computer-implemented method comprising:
- training a generative model using each of the one or more multi-scale time-series data sets;
- providing an optimization problem composed of a likelihood function of the generative model and a score function that reflects a utility of the dynamic system;
- solving the optimization problem so as to determine an influence indicator indicating an influence scheme for influencing the outcome dynamics; and
- providing the influence indicator to an outcome-dynamics influencing system.
2. The computer-implemented method according to claim 1, wherein the generative model comprises a recursive convolutional Bayesian model (RCBM).
3. The computer-implemented method according to claim 2, wherein the RCBM uses a convolutional operator that carries out a scale-and-copy task.
4. The computer-implemented method according to claim 1, wherein the computer-implemented method is configured for a particular application, and the score function is selected based on the particular application.
5. The computer-implemented method according to claim 4, wherein the score function is a pattern-matching score function.
6. The computer-implemented method according to claim 4, wherein the score function is a profit-maximization score function.
7. The computer-implemented method according to claim 4, wherein the particular application comprises a social-media marketing campaign.
8. The computer-implemented method according to claim 1, wherein the particular application comprises event-alert notification.
9. The computer-implemented method according to claim 1, wherein the one or more multi-scale time-series data sets includes a social-media data set.
10. The computer-implemented method according to claim 9, wherein the social-media data set comprises a social-media data stream.
11. The computer-implemented method according to claim 1, wherein the one or more multi-scale time-series data sets includes video data.
12. A machine-readable storage medium containing computer-executable instructions for performing a method of engineering outcome dynamics of a dynamic system that includes one or more multi-scale time-series data sets, the computer-implemented method comprising:
- training a generative model using each of the one or more multi-scale time-series data sets;
- providing an optimization problem composed of a likelihood function of the generative model and a score function that reflects a utility of the dynamic system;
- solving the optimization problem so as to determine an influence indicator indicating an influence scheme for influencing the outcome dynamics; and
- providing the influence indicator to an outcome-dynamics influencing system.
13. The machine-readable storage medium of claim 12, wherein the generative model comprises a recursive convolutional Bayesian model (RCBM).
14. The machine-readable storage medium of claim 13, wherein the RCBM uses a convolutional operator that carries out a scale-and-copy task.
15. The machine-readable storage medium of claim 12, wherein the computer-implemented method is configured for a particular application, and the score function is selected based on the particular application.
16. The machine-readable storage medium of claim 15, wherein the score function is a pattern-matching score function.
17. The machine-readable storage medium of claim 15, wherein the score function is a profit-maximization score function.
18. The machine-readable storage medium of claim 15, wherein the particular application comprises a social-media marketing campaign.
19. The machine-readable storage medium of claim 12, wherein the particular application comprises event-alert notification.
20. The machine-readable storage medium of claim 12, wherein the one or more multi-scale time-series data sets includes a social-media data set.
Type: Application
Filed: May 19, 2022
Publication Date: Sep 1, 2022
Inventors: Radu Marculescu (Pittsburgh, PA), Huan-Kai Peng (Pittsburgh, PA)
Application Number: 17/748,173