INDIVIDUAL TREATMENT ASSIGNMENT FROM MIXTURE OF INTERVENTIONS
An analytics system identifies interventions for individual samples from a set of samples with a mixture of interventions. Given a causal graph, a set of baseline samples, and a set of samples with interventions, a set of intervention tuples is determined that represents the mixture of interventions for the set of samples with interventions. Each intervention tuple in the set of intervention tuples identifies an intervention and a mixing coefficient representing a percentage of samples with the intervention. An iterative process is used in which a set of intervention tuples is determined for N variables and then lifted to a set of intervention tuples for N+1 variables until all variables from the causal graph have been considered, providing a final set of intervention tuples. The final set of intervention tuples is used to match individual samples from the set of samples with interventions to interventions.
Understanding the treatment effect of interventions on a desired outcome is one of the main components of prescriptive analysis in the sciences and social sciences. In some instances, known interventions are taken on a system and the effect of the interventions can be analyzed. However, there are many situations where data being analyzed contains samples that result from unintended/unknown interventions. For example, unknown interventions often occur in gene knockout techniques. These gene knockout techniques are intended to target a particular genome. However, experiments have been observed to have off target effects which create unwanted and hidden manipulations in the genome. As another example, an unknown intervention can occur in a system analyzing promotional emails when a promotional email ends up in a spam folder and therefore never reaches a targeted individual. Similarly, an automated campaign might end up sending an email with incorrect or unintended content. In such scenarios, different samples can get randomly exposed to different unknown interventions, thereby creating a mixture of interventions.
SUMMARYEmbodiments of the present invention relate to, among other things, an analytics system that identifies interventions for individual samples from a set of samples with a mixture of interventions. Given a causal graph, a set of baseline samples, and a set of samples with interventions, a set of intervention tuples is determined that represents the mixture of interventions for the set of samples with interventions. Each intervention tuple in the set of intervention tuples identifies an intervention and a mixing coefficient representing a percentage of samples with the intervention. An iterative process is used in which a set of intervention tuples is determined for N variables and then lifted to a set of intervention tuples for N+1 variables until all variables from the causal graph have been considered, providing a final set of intervention tuples.
At each iteration, the set of intervention tuples for N+1 variables is determined by solving a system of equations using probability distributions calculated over the N+1 variables. In some instances, the probability distributions are perturbed to ensure all probabilities are positive. The system of equations are solved by setting each mixing coefficient to zero one at a time to determine values for remaining mixing coefficients. This provides a number of candidate sets of intervention tuples, and the set of intervention tuples for the N+1 variables is selected from the candidate sets (e.g., based on an L2 norm for each candidate set).
Once the set of intervention tuples has been determined for all variables in the causal graph, individual samples from the set of samples with interventions are matched to interventions based on the set of intervention tuples. This can include assigning a sample to an intervention that maximizes the probability that the sample resulted from the intervention.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The present invention is described in detail below with reference to the attached drawing figures, wherein:
Various terms are used throughout this description. Definitions of some terms are included below to provide a clearer understanding of the ideas disclosed herein.
Notation: Capital letters (e.g., X) are used herein to represent random variables and the corresponding lower case letter x to denote the assignment X=x. The set of values taken by random variable X will be denoted by CX. Each random variable can be discrete and have finite support i.e. ICX|<∞. A tuple or set of random variables is denoted by capital bold face letter (e.g., X) and the corresponding lower case bold faced letter x denotes the assignment X=x. Let, CX=ΠX
As used herein, a “causal graph” depicts causal relationships among a set of variables. The causal graph can include a number of nodes with edges between the nodes. Each node corresponds with a variable and each edge represents a causal relationship between two nodes/variables. The variables can include treatment variables, co-variates, and outcomes. In some instances, the causal graph comprises a directed acyclic graph (DAG). The casual graph can be manually generated using expert knowledge or learned from observational data (e.g., feeding observational data into known algorithms for learning a causal graph). Let ={V, ∈} be a DAG with node set V={V1, . . . , Vn} where each node Vi represents a random variable. is called a Bayesian Network if the following factorization of the joint probability of V holds.
where pa(Vi) are parent nodes of Vi. A causal Bayesian Network (CBN) is a Bayesian Network where all edges denote direct causal relationships. It allows for modeling effect of external actions called “interventions”, by appropriate modification of the Bayesian Network.
An “intervention” comprises an external action on a system under investigation that sets a particular value for a variable. For example, an intervention can comprise administration of a dosage of medicine to a patient. In some cases, an intervention is known or intended; while in other cases, an intervention is unknown or unintended. For example in the case of gene-editing, a known/intended intervention occurs when a certain target gene is spliced out and replaced with the desired gene. However, the gene-editing also causes unintended cleavage at unknown genome sites, which comprise unknown/unintended interventions.
A natural way to model interventions in causal Bayesian Networks is to perform the act of causal surgery, wherein, incoming edges into the node(s) to be intervened are removed and the node(s) is forcibly fixed to the desired value. The new network thus obtained is treated as the Bayesian Network modelling effect of the intervention. Formally, if intervention is performed on nodes X⊆V with a desire to set it to value x*∈CX, then the effect of this intervention (also known as interventional distribution) is a probability distribution on V denoted as (v|do(x*))(or x*(v)). In the intervened Bayesian Network, conditional probability distributions (CPD) (Xi|pa(Xi)) of all Xi∈X that are intervened and set to xi*, changes to the Kronecker delta function δx
A “mixture of interventions” comprises a system in which multiple known and/or unknown interventions have occurred. Let ={V, ε} be a causal Bayesian Network. A probability distribution mix(V) is called a mixture of interventions if for some m∈, there exist subsets T1, . . . , Tm⊆V, corresponding values ti∈CT
where ti≠tj for all i≠j∈[m] (if, ti=tj then (πi+πj)t
An “intervention tuple” comprises an identification of an intervention (ti) and a mixing coefficient (πi) associated with the intervention. The intervention can comprise a set value for a given variable. The mixing coefficient specifies a percentage of a population having the corresponding intervention.
A “sample” refers to an individual observation of data from an overall collection of data. A sample can provide an indication of a value for each of a number of variables. For example, a sample can identify a patient, the patient's age, the patient's gender, whether the patient received a vaccine, and an outcome for the patient. As another example, a sample can correspond with an instance of a marketing email for an individual, specifying a number of past marketing emails opened by the individual, a subject line for a current marketing email, and an indication of whether the individual opened the current marketing email.
OverviewCurrent analytics systems often use causal graphs, such as Causal Bayesian Networks (CBN), to model causal relationships in many real-world systems. These models can simulate the effects of external interventions that forcibly fix target system variables to desired target values. The simulation is done via the do( ) operator, wherein the CBN is altered by breaking incoming edges of the target variables and fixing them to desired target values. These interventions on the CBN can be used to estimate the effects of treatments on desired outcomes. However, real-world interventions are not always precise and might end up as incorrect, unknown and unintended. There are often situations where data being analyzed contains samples of many such unintended/unknown interventions.
As an example to illustrate, consider an email marketing system that sends email promotions with two possible subject line options: Subject_1 or Subject_2. For each customer, the subject line is selected depending on how many emails the customer has opened in the past. Individuals who engaged more in the past get Subject_1 (with high probability) and those who engage less get Subject_2 (with high probability). Finally, at the end of the promotion duration, for each customer, information is obtained on whether the email was opened by the customer or not. The data is used to generate a CBN with three variables: “Past_Opens”, “Treatment”, and “Open”. The Past_Opens variable denotes the co-variate indicating the number of emails opened by the customer in the past. The Treatment variable denotes which of the two subject lines was sent to the customer. The Open variable is the outcome and can be one of {Yes, No}. Customer data from historical campaigns (and experiments if necessary) can be used to estimate the conditional probabilities P(Open|Treatment, Past_Opens), P(Treatment|Past_Opens) and P(Past_Opens) and therefore have a complete description of the CBN.
Now consider a new campaign, where an unknown and unintended intervention occurs at the customer's end. For instance, consider a customer who has made many opens in the past and therefore was sent an email with subject line Subject_1. However, due to a new spam filter deployed by the customer's email provider, the email gets filtered as spam and therefore never gets opened. The analytics systems has no knowledge of the spam filter, and it could be assumed that the email was not opened because the customer was not interested. In general, many such unknown interventions on different customers might be happening, making this problem quite challenging for the analytics system to resolve. Identifying which customer went through which unknown interventions could be of high value.
As another example, gene knockout experiments via the CRISPR-Cas9 gene-editing technology are intended to target known genome sites. However, the technology results in unintended cleavage at unknown genome sites. Moreover, the unintended intervention targets can themselves be noisy; i.e., different individuals targeted by the same intervention might undergo completely different off-target interventions. For example, some studies have demonstrated that the same gene editing experiment (using CRISPR-Cas9) on mice embryos exhibited different unintended cleavage for different mice.
In both the above situations, samples (e.g., individual customers or gene-editing targets) that underwent different unintended (or no) interventions are not segregated and therefore the generated distribution becomes a mixture of individual interventional distributions.
Some existing solutions have been proposed for this problem but the solutions have drawbacks. For instance, one solution employs an algorithm to learn such unintended interventions under the assumption that each unintended intervention is only on a single variable. This is a very restrictive assumption since the unintended intervention is generally not in control and could be possibly affecting multiple treatment variables. This can be seen by extending the above marketing example to have two treatment variables, the promotional email (i.e., email promotions with two possible subject line options: Subject_1 or Subject_2) and an app notification. It's possible that some customers who were affected by the unintended intervention due to the spam filter also disable notifications on their phone leading to an unintended and unknown intervention on two variables. The existing solution does not tackle this situation and only works when all unintended interventions are on a single variable.
Another existing technique uses a brute force algorithm to solve this problem. However, in the case of N variables with k categories each, there are Ω(kN) such possible interventions and therefore the system is over exponentially many variables. This makes it infeasible for even very small values of N.
Embodiments of the present invention solve these problems by providing an analytics system that identifies interventions for individual samples from a set of samples having a mixture of interventions. The mixture of interventions includes known and/or unknown interventions. As will be described in further detail herein, some configurations of the technology described herein enforces two conditions to determine interventions and their mixing coefficients (percentage of samples) in the mixture of interventions. The first condition is that a given system satisfies positivity; i.e., the probability distributions generated from samples have all positive probabilities. The second condition is that a set of intervention tuples (interventions and their corresponding mixing coefficients) satisfy exclusion; i.e. some setting of each variable is missing from the mixture. Using these two conditions, a set of intervention tuples can be determined for samples with a mixture of interventions, and the set of intervention tuples can in turn be used to match individual samples to interventions.
In accordance with some aspects of the technology described herein, a set of samples having a mixture of interventions is received for which assignment of individual samples to interventions is to be determined. Additionally, a causal graph and a set of baseline samples (i.e., samples without the unknown mixture of interventions) are received. Given the causal graph, set of baseline samples, and set of samples with interventions, a set of intervention tuples is determined, representing the mixture of interventions. Each intervention tuple in the set of intervention tuples identifies an intervention (i.e., setting of a particular variable to a particular value) and a mixing coefficient for the intervention (i.e., a percentage of the samples, from the set of samples having interventions, for which that intervention contributed to the samples).
The set of intervention tuples is determined using an iterative process in which a set of intervention tuples is determined for N variables and then lifted to a set of intervention tuples for N+1 variables. For instance, given a causal graph with three variables, a set of intervention tuples is initially be determined for a first variable from the causal graph and that set of intervention tuples lifted to a set of intervention tuples for the first variable and a second variable, which is then lifted to a set of intervention tuples for all three variables. The order in which the variables are selected can be based on a topological ordering of the variables in the causal graph.
To determine the set of intervention tuples at each iteration, a system of equations is generated and solved using probability distributions estimated on the number of variables under consideration for the iteration. The probability distributions are estimated given the set of baseline samples and the set of samples with interventions. In some configurations, positivity is enforced by perturbing the probability distributions such that all probabilities are greater than zero. To solve the system of equations, exclusion is enforced by setting each mixing coefficient to zero one at a time to provide candidate sets of interventions tuples (i.e., a candidate set of intervention tuples for each time a mixing coefficient is set to zero). A set of intervention tuples is selected from the candidate sets of intervention tuples. For instance, an L2 norm can be computed for each candidate set of intervention tuples, and the set of intervention tuples having the lowest L2 norm is selected. In some configurations, a threshold is employed such that any mixing coefficient less than the threshold is set to zero and remaining mixing coefficients renormalized. This ensures that any intervention with a low mixing coefficient is removed from consideration.
Once the set of intervention tuples has been determined for all variables in the causal graph, individual samples from the set of samples with interventions are matched to interventions based on the set of intervention tuples. This can include determining, from the set of intervention tuples, the intervention that maximizes the probability that the sample resulted from the intervention. The sample can then be assigned to that intervention.
The technology described herein provides a number of advantages over conventional approaches. For instance, the technology described herein can identify a mixture of interventions and assign interventions to samples for systems having any number of variables. This is in contrast to some solutions that are limited to determining inventions for a single variable. While brute force approaches are not limited to a single variable, in the case of N variables with k categories each, there are Ω(kN) such possible interventions and the system is over exponentially many variables, thereby making it infeasible for even very small values of N. In contrast, time complexity of the technology described herein does not have an exponential dependence on N, thereby making it more efficient. As will be described in further detail below, the approach of the technology described herein is compared to the brute force approach for a small graph (due to the reason just mentioned), demonstrating better performance of the technology described herein even with smaller graphs. For causal graphs with even moderate number of nodes, the brute force becomes intractable, while the approach of the technology described herein can operate on such graphs.
Example System for Identifying InterventionsWith reference now to the drawings,
The system 100 is an example of a suitable architecture for implementing certain aspects of the present disclosure. Among other components not shown, the system 100 includes a user device 102 and an analytics system 104. Each of the user device 102 and analytics system 104 shown in
At a high level, given a set of samples with a mixture of interventions, the analytics system 104 identifies interventions and matches the interventions to individual samples. As shown in
The analytics system 104 can be implemented using one or more server devices, one or more platforms with corresponding application programming interfaces, cloud infrastructure, and the like. While the analytics system 104 is shown separate from the user device 102 in the configuration of
The analytics system 104 operates on inputs that include a causal graph, a set of baseline samples, and a set of samples with interventions. The causal graph provides information regarding causal relationships among variables for a system being analyzed. Prior knowledge of the system along with state of the art algorithms can be used to learn the causal graph when no unknown interventions happen. The set of baseline samples comprises a collection of samples in which unknown interventions have not occurred. The set of samples with a mixture of interventions comprises samples in which an unknown intervention has occurred in at least some of the samples.
By way of example for illustration purposes,
Let ={V, ε} be a causal graph (e.g., a causal Bayesian Network) and (V) be the associated joint probability distribution. Let mix(V) be any mixture of interventions with the set of intervention tuples ={(t1, π1), . . . , (tm, πm)}. The analytics system 104 is given access to and finitely many samples from (V) (i.e., set of baseline samples) and mix(V) (i.e., set of samples with a mixture of interventions). The intervention identification module 108 determines the set of unknown interventions in . The intervention assignment module 110 determines, for each given sample from mix(V), the actual intervention (from the ones in ) that the sample got generated as a result of.
In accordance with aspects of the technology descried herein, each of the interventions ti, corresponds to an intervention that intentionally or unintentionally transpired. Since the ultimate goal of the intervention identification module 108 is to recover the interventions ti from the mixture distribution, the intervention identification module 108 “uniquely” defines the mixture. Formally, there should not exist two distinct sets of intervention tuples ={(t11, π11), . . . , (tn1, πn1)} and ={(t12, π12), . . . , (tm2, πm2)} which generate the same mixture distribution, i.e.,
Given access to a causal graph and the joint distribution (V) it captures, the intervention identification module 108 takes as input the mixture distribution mix(V) and recovers the unknown set of intervention tuples that generated mix(V)) by employing two mild assumptions to provide uniqueness and identifiability.
The first assumption is referred to herein as “positivity.” Let V be the set of nodes in the causal graph and (V) be the corresponding joint probability distribution. Positivity assumes that (v)>0 for all v∈CV. The second assumption is referred to herein as “exclusion.” Let be a set of intervention tuples. satisfies exclusion, if for all Vi∈V, there exists
Given the set of baseline samples and the set of samples with interventions, the intervention identification module 108 estimates marginal and conditional probabilities of the underlying distributions. For instance, the set of baseline samples can be represented by ={b1, . . . , bM} where bi˜(V), and the set of samples with interventions can be represented by mix={b1mix, . . . , bMmix} where bjmix˜mix(V). The intervention identification module 108 enforces positivity on the estimates by ensuring that all probabilities are non-zero. This can be done by perturbing the distributions slightly, for instance, using a small positive constant 6 such that all probabilities are non-zero.
The intervention identification module 108 uses the probability distribution estimates to determine the set of intervention tuples. Each intervention tuple in the set identifies a particular intervention (i.e., setting a variable to a certain value) and a corresponding mixing coefficient (i.e., a percentage of a population that received the intervention). The following description first discusses the determination of a set of intervention tuples given a single variable. This is followed by a description of determining a set of intervention tuples for a system with multiple variables by iteratively lifting a solution from N variables to N+1 variables.
Single Variable: When there is a single variable (i.e., |V|=1, say V={V}), the most general form of mix (i.e. allowing for scalar weights to be ≥0) can be written as:
mix(V)=π0t
where t0=∅, t1=v1, . . . , tk=vk are intervention targets corresponding to the different possible values v1, . . . , vk of V with π0=1−(π1+ . . . +πk).
Since the intervention identification module 108 only has access to estimates , mix of , mix respectively, the intervention identification module 108 sets up the above system using these estimates. In particular, equation (1) is rearranged to get a system of the form Aπ=b given as,
where bi=mix(vi)−(vi) and ai=(vi)>0 (i.e., enforcing positivity).
To enforce exclusion, the intervention identification module 108 sets each mixing coefficient to zero one at a time and solves the system of equations (2) using the probability estimates (, mix) to determine the values for the mixing coefficients (π1, . . . , πk). This provides a number of candidate sets of intervention tuples. In particular, for every mixing coefficient that is set to zero, a candidate set of intervention tuples (={ti, πi}:i∈[k]) is determined. For instance, π1 is set to zero and the system of equations (2) is solved to determine mixing coefficients for a first candidate set of intervention tuples; π2 is set to zero and the system of equations (2) is solved to determine mixing coefficients for a second candidate set of intervention tuples; etc. For every such candidate set of intervention tuples , the intervention identification module 108 iterates through the tuples (ti, πi) and set any non-zero mixing coefficient to zero (i.e., if some πi<0, set πi←0.
The intervention identification module 108 compares the candidate sets of intervention tuples to select one as the set of intervention tuples. In some configurations, the intervention identification module 108 computes the L2 norm for each candidate set of intervention tuples, and select the candidate set of intervention tuples with the lowest L2 norm. For instance, this can comprise computing the score r()=∥Aπ−b∥2 and selecting a set of intervention tuples with the smallest value of r().
In some configurations, a threshold ϵ can be employed such that only mixing coefficients greater than the threshold are retained. Any mixing coefficient below the threshold is set to zero, and the remaining mixing coefficients above the threshold can be renormalized.
Selection of the threshold E impacts the time complexity of the system as follows:
N is number of nodes in the causal graph G, d is the maximum in-degree of any node, kmax is maximum number of values that any node in G can take and M is the number of samples present in and mix. Since the algorithm's run-time depends on ∈, the value of ∈ can be selected based on desired outcomes. Setting E too small could increase the run time, whereas setting E too big could lead to pruning interventions with significant mixing proportions present in the mixture.
Multiple Variables: Where the system includes multiple variables, the intervention identification module 108 reduces to a problem with N variables and, using a recursive call to this function, computes its solution. The computed solution for N variables is lifted to a solution on N+1 variables. By way of example to illustrate with reference to the causal graph 200 of
At each iteration, the intervention identification module 108 reduces from N+1 variables to N variables. Let V1 . . . VN+1 denote a topological order in G. The approach marginalizes on VN+1 to create access to mix(VN) and (VN) where VN=(V1, . . . , VN) and constructs GN=G\{VN+1}. This algorithm is recursively called with inputs GN, (VN), mix(VN) to obtain the set of intervention tuples ={(s1, μ1), . . . , (sq, μq)}.
The invention identification module lifts the solution for N variables to N+1 variables. To lift the set of intervention tuples for N variables to a set of intervention tuples for N+1 variables, it is first noted that the only intervention components that can appear in the original mixture are of the form si∪{vj}, i∈{1, . . . , q}, j∈{1, . . . , k}. Here v1, . . . , vk are all possible values of the variable VN+1. Therefore the original mixture has the form:
where πs
In the above system, the known values are renamed as follows. For l∈[k], denote:
To enforce exclusion, one mixing coefficient is set to zero at a time giving a solution (πs
The intervention identification module 108 compares the candidate sets of intervention tuples to select one as the set of intervention tuples. In some configurations, the intervention identification module 108 computes the L2 norm for each candidate set of intervention tuples, and select the candidate set of intervention tuples with the lowest L2 norm. For instance, this can comprise computing the score r()=∥Aπ−b∥2 where π=(πs
At the end of this process, all the intervention tuples thus obtained are collect (for all i∈[q]), in the set . To make sure that exclusion is satisfied for variable VN+1, the excluded value of node VN+1 is found, i.e. the value which is not present in any target in . If no such value exists, value v of VN+1 which minimizes Σi=1qπs
Given a set of intervention tuples determined by the intervention identification module 108, the intervention assignment module 110 maps samples, from the set of samples with interventions, to interventions. Let ={(ti, πi)} be the set of intervention tuples obtained above. For each sample bjmix in the set of samples with interventions mix, the intervention assignment module 110 finds the intervention ti in the set of intervention tuples that maximizes the probability that the sample resulted from the intervention ti; i.e. (bjmix|do(ti)). The intervention assignment module 110 returns an indication that the sample bjmix was created due to that intervention ti.
The user interface (UI) module 112 of the analytics system 104 provides one or more user interfaces for interacting with the system. For instance, the UI module 112 can provide user interfaces for receiving input, such as a causal graph and sample sets, and providing output, such as an indication of a set of intervention tuples and/or assignments of individual samples to interventions. For instance, the UI module 112 can provide user interfaces to a user device, such as the user device 102. The user device 102 can be any type of computing device, such as, for instance, a personal computer (PC), tablet computer, desktop computer, mobile device, or any other suitable device having one or more processors. As shown in
With reference now to
As shown at block 302, input is received, including a causal graph, a set of baseline samples, and a set of samples with interventions. Estimated probability distributions are determined using the set of baseline samples and set of samples with interventions, as shown at block 304. In order to ensure positivity, the estimated probability distributions are perturbed to provide that all estimated probabilities are non-zero, as shown at block 306. This can include adding a small constant to any estimated probabilities that are not non-zero and renormalizing the estimated probability distributions.
Candidate sets of interventions tuples are determined using the estimated probability distributions, as shown at block 308. As described hereinabove, this can include generating a system of equations and using the estimated probabilities to determine mixing coefficients for interventions. Enforcing exclusion, each mixing coefficient is set to zero one at a time, and the system of equations is solved using the probability estimates to determine the values of the mixing coefficients. This provides multiple candidate sets of intervention tuples—a set for each mixing coefficient being set to zero. In some configurations, any non-zero mixing coefficient is set to zero. Additionally, if a threshold is employed, any mixing coefficient below the threshold can be set to zero, and the remaining coefficients above the threshold can be renormalized.
As shown at block 310, the candidate sets of interventions tuples are compared, and one is selected as the set of intervention tuples. In some instances, this comprises computing the L2 norm for each candidate set of intervention tuples and selecting the candidate set of intervention tuples with the lowest L2 norm.
Turning now to
A counter is initialized to zero, as shown at block 404, and the counter is incremented by one at block 406. A determination is made at block 408 whether the counter is equal to one. Because the counter is initially at one, a first variable is selected from the causal graph, and a set of intervention tuples is determined for the first variable, as shown in block 410. The first variable selected here can be based on the topological ordering of variables from the causal graph. For instance, using the causal graph 200 of
Post this, a determination is performed at block 414 regarding whether there are any additional variables. If there are additional variable(s), the process returns to block 406, at which the counter is incremented by one. Because the counter is no longer at one, the process continues from block 408 to block 412, at which the solution of the set of intervention tuples for C−1 variables (i.e., N variables) is lifted to a solution of a set of intervention tuples for C variables (i.e., N+1 variables). When the counter is at two, the single variable provided in block 410 is lifted for the second variable at block 412. As described hereinabove, this can include solving a system of equations using the set of intervention tuples determined for the Pt variable and probability distributions determined for the 2nd variables (which can require marginalizing the data if the total number of variables is greater than the current N+1 variables). To enforce exclusion, each mixing coefficient is set to zero one at a time, and the system of equations is solved to determine the values of the mixing coefficients. This provides multiple candidate sets of intervention tuples—a set for each mixing coefficient being set to zero. In some configurations, any non-zero mixing coefficient can be set to zero. Additionally, if a threshold is employed, any mixing coefficient below the threshold can be set to zero, and the remaining coefficients above the threshold can be renormalized. The candidate sets of interventions tuples are compared, and one is selected as the set of intervention tuples for the 2nd variables. In some instances, this can comprise computing the L2 norm for each candidate set of intervention tuples and selecting the candidate set of intervention tuples with the lowest L2 norm.
As shown at block 414, a determination is made regarding whether there are any additional variables. If so, the process of blocks 406 through 412 is repeated. In particular, the determination of intervention tuple set with C equal to (N+1)th variable is made using the intervention tuple set found in block 412 when C was equal to N. The variable selected at each iteration can be based on the topological ordering of variables from the causal graph. As an example using the causal graph 200 of
With reference now to
As shown at block 506, an intervention, from the set of intervention tuples, that maximizes the probability that the sample resulted from the intervention is determined. The sample is matched to the determined intervention, as shown at block 508.
Performance EvaluationTwo experiments were run to assess the performance of the technology described herein. In the first experiment, the approach of the technology described herein was compared to a brute force baseline on a small graph (since brute force does not scale) using accuracy metrics. In the second experiment, random causal graphs were simulated, samples were generated from the causal graphs, and the performance of the technology described herein was evaluated using accuracy metrics. Note that these experiments are focused on comparing the performance of recovering the set of intervention tuples of the original mixture. The experiments do not compare the performance of the final sample to intervention mapping since it depends on the previous step and the more accurate the identification of intervention tuples, the more accurate the mapping will be. Basically, the errors introduced are mostly due to the algorithm that finds the intervention tuples, and the error from mapping of samples to interventions is essentially due to noise from sampling.
Comparison with brute force baseline: Note that the brute force algorithm will only work for very small number of nodes. A causal graph on 4 nodes was generated with each node having 4 categories, creating an input mixture with 16 intervention components. 10000 samples were obtained from both the actual model and the mixture, the two algorithms (i.e., the technology described herein and brute force) were applied to these samples. Since samples are being used, both the algorithms will find non-zero mixing coefficients for components and therefore to correctly identify the unknown interventions a threshold was applied to the mixing proportions. To have a fair comparison between the algorithms, the same threshold 0.001 was used. Table 1 below compares some accuracy metrics of both these algorithms. Recall is the number of interventions that were correctly identified. Precision is the fraction of correct interventions among the ones recovered. RMSE is the root mean squared error in the mixing proportions. Note that for even slightly larger graphs, the brute force will not be tractable and the algorithm of the current technology will be the only known one that is efficient.
Performance for larger graphs and varying sample sizes. A simulation study was performed to experimentally analyze performance of the technology described herein.
Simulation Setup. For each simulation setting (N=number of nodes, M=number of samples), a directed acyclic graph was sampled on N nodes (each having 3 categories), from the Scale-Free (SF) model, with number of edges chosen uniformly randomly from [N, 5N]. For each graph, the CPD of each node was modeled as a multinoulli distribution with Dirichlet priors having fixed parameter α=2 for all categories. This was done to conform with positivity. This generated a causal Bayesian Network . A set of M samples was generated using ancestral sampling on this network and used as input for an algorithm using the technology described herein. To create a mixture, an integer m was uniformly randomly chosen from the set [4,16] and used as the number of interventions in the mixture. Iterating from 1 to m was performed to build each intervention target of the mixture. First, the size of the target was chosen by picking an integer r uniformly randomly from the set {0, . . . , N}. Then, an r-sized subset of [N] was uniformly randomly chosen, defining variables in the target. For each of these variables, a category was uniformly randomly chosen and removed from consideration (to satisfy exclusion). From the remaining categories, one was uniformly randomly selected for each variable in the target and used to define the intervention. Finally, m scalar weights were generated for mixing coefficients such that they sum to 1. To make sure that these mixing coefficients are not too small, there were generated with Dirichlet priors with all parameter values fixed to 2. A set mix containing M samples was generated from this mixture model and used as input for the algorithm. Parameters ∈, δ used by the algorithm were set to 0.01 and 1/M respectively. The settings for N and M used in the experiments were (N, M)∈{4,8,12}×{24, 25, . . . , 220} where × is the direct product of sets.
Evaluation Metrics: Let denote the actual set of intervention targets and denote the set of intervention targets computed by the algorithm of the technology described herein. Let πt, {circumflex over (π)}s denote mixing coefficients of target t, s in and respectively. The following evaluation metrics were used to evaluate the performance of the algorithm.
-
- 1. Recall: Proportion of number of targets in that were correctly identified in
-
- 2. Root Mean Squared Error: Root-mean-squared error (RMSE) in the mixing coefficients.
-
- 3. False-Positive RMSE: RMSE in the mixing coefficients of the incorrectly identified targets.
-
- 4. False-Negative RMSE: RMSE in the mixing coefficients of targets not identified.
Results Discussion:
In
In
In
To further understand the performance of the technology described herein with respect to the number of nodes, in
Having described implementations of the present disclosure, an exemplary operating environment in which embodiments of the present invention can be implemented is described below in order to provide a general context for various aspects of the present disclosure. Referring initially to
The invention can be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program modules, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program modules including routines, programs, objects, components, data structures, etc., refer to code that perform particular tasks or implement particular abstract data types. The invention can be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, more specialty computing devices, etc. The invention can also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network.
With reference to
Computing device 900 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computing device 900 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media can comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 900. Computer storage media does not comprise signals per se. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
Memory 912 includes computer storage media in the form of volatile and/or nonvolatile memory. The memory can be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid-state memory, hard drives, optical-disc drives, etc. Computing device 900 includes one or more processors that read data from various entities such as memory 912 or I/O components 920. Presentation component(s) 916 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc.
I/O ports 918 allow computing device 900 to be logically coupled to other devices including I/O components 920, some of which can be built in. Illustrative components include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc. The I/O components 920 can provide a natural user interface (NUI) that processes air gestures, voice, or other physiological inputs generated by a user. In some instance, inputs can be transmitted to an appropriate network element for further processing. A NUI can implement any combination of speech recognition, touch and stylus recognition, facial recognition, biometric recognition, gesture recognition both on screen and adjacent to the screen, air gestures, head and eye-tracking, and touch recognition associated with displays on the computing device 900. The computing device 900 can be equipped with depth cameras, such as, stereoscopic camera systems, infrared camera systems, RGB camera systems, and combinations of these for gesture detection and recognition. Additionally, the computing device 900 can be equipped with accelerometers or gyroscopes that enable detection of motion.
The present invention has been described in relation to particular embodiments, which are intended in all respects to be illustrative rather than restrictive. Alternative embodiments will become apparent to those of ordinary skill in the art to which the present invention pertains without departing from its scope.
Having identified various components utilized herein, it should be understood that any number of components and arrangements can be employed to achieve the desired functionality within the scope of the present disclosure. For example, the components in the embodiments depicted in the figures are shown with lines for the sake of conceptual clarity. Other arrangements of these and other components can also be implemented. For example, although some components are depicted as single components, many of the elements described herein can be implemented as discrete or distributed components or in conjunction with other components, and in any suitable combination and location. Some elements can be omitted altogether. Moreover, various functions described herein as being performed by one or more entities can be carried out by hardware, firmware, and/or software, as described below. For instance, various functions can be carried out by a processor executing instructions stored in memory. As such, other arrangements and elements (e.g., machines, interfaces, functions, orders, and groupings of functions) can be used in addition to or instead of those shown.
Embodiments described herein can be combined with one or more of the specifically described alternatives. In particular, an embodiment that is claimed can contain a reference, in the alternative, to more than one other embodiment. The embodiment that is claimed can specify a further limitation of the subject matter claimed.
The subject matter of embodiments of the invention is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” and/or “block” can be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.
For purposes of this disclosure, the word “including” has the same broad meaning as the word “comprising,” and the word “accessing” comprises “receiving,” “referencing,” or “retrieving.” Further, the word “communicating” has the same broad meaning as the word “receiving,” or “transmitting” facilitated by software or hardware-based buses, receivers, or transmitters using communication media described herein. In addition, words such as “a” and “an,” unless otherwise indicated to the contrary, include the plural as well as the singular. Thus, for example, the constraint of “a feature” is satisfied where one or more features are present. Also, the term “or” includes the conjunctive, the disjunctive, and both (a or b thus includes either a or b, as well as a and b).
For purposes of a detailed discussion above, embodiments of the present invention are described with reference to a distributed computing environment; however, the distributed computing environment depicted herein is merely exemplary. Components can be configured for performing novel embodiments of embodiments, where the term “configured for” can refer to “programmed to” perform particular tasks or implement particular abstract data types using code. Further, while embodiments of the present invention can generally refer to the technical solution environment and the schematics described herein, it is understood that the techniques described can be extended to other implementation contexts.
From the foregoing, it will be seen that this invention is one well adapted to attain all the ends and objects set forth above, together with other advantages which are obvious and inherent to the system and method. It will be understood that certain features and subcombinations are of utility and can be employed without reference to other features and subcombinations. This is contemplated by and is within the scope of the claims.
Claims
1. A computerized method comprising:
- receiving, by an intervention identification module, a set of baseline samples, a set of samples with interventions, and a causal graph having a plurality of variables;
- iteratively determining, by the intervention identification module, a set of intervention tuples for N+1 variables from the causal graph until a final set of invention tuples is generated for all variables in the causal graph by: selecting N+1 variables from the causal graph by incrementing N from a previous iteration; determining a set of intervention tuples for N variables; and lifting the set of intervention tuples for N variables to the set of invention tuples for N+1 variables using the set of intervention tuples for N variables and estimated probability distributions for N+1 variables determined using the set of baseline samples and the set of samples with interventions; and
- assigning, by an intervention assignment module, each sample from at least a portion of the set of samples with interventions to an intervention using the final set of intervention tuples.
2. The computerized method of claim 1, wherein selection of the N+1 variables at each iteration is based on a topological ordering of the variables from the causal graph.
3. The computerized method of claim 1, wherein determining the set of intervention tuples for a first variable at a first iteration comprises:
- generating estimated probability distributions for the first variable by marginalizing data from the set of baseline samples and the set of samples with interventions;
- generating a first system of equations to determine mixing coefficients for interventions for the first variable;
- generating a first plurality of candidate sets of intervention tuples for the first variable by repeatedly setting each mixing coefficient in the first system of equations to zero and solving the first system of equations using the estimated probability distributions for the first variable; and
- selecting the set of intervention tuples for the first variable from the first plurality of candidate sets of intervention tuples.
4. The computerized method of claim 3, wherein generating the estimated probability distributions for the first variable includes perturbing the estimated probability distributions such that all probabilities are non-zero.
5. The computerized method of claim 3, wherein a first mixing coefficient for a first intervention tuple in the selected set of intervention tuples is set to zero based on the mixing coefficient being below a threshold and other mixing coefficients are renormalized based on setting the first mixing coefficient to zero.
6. The computerized method of claim 3, wherein the set of intervention tuples for the first variable is selected from the first plurality of candidate sets of intervention tuples by:
- computing an L2 norm for each candidate set of intervention tuples; and
- selecting the candidate set of intervention tuples with the lowest L2 norm.
7. The computerized method of claim 3, wherein determining the set of intervention tuples for the first variable and a second variable at the first iteration comprises:
- generating estimated probability distributions for the first and second variables using the set of baseline samples and the set of samples with interventions;
- generating a second system of equations to determine mixing coefficients for interventions for the first and second variables;
- generating a second plurality of candidate sets of intervention tuples for the first and second variables by repeatedly setting each mixing coefficient in the second system of equations to zero and solving the second system of equations using the set of intervention tuples for the first variable and the estimated probability distributions for the first and second variables; and
- selecting the set of intervention tuples for the first and second variables from the second plurality of candidate sets of intervention tuples.
8. The computerized method of claim 7, wherein determining the set of intervention tuples for the first variable, the second variable, and a third variable at a second iteration comprises:
- generating estimated probability distributions for the first, second, and third variables using the set of baseline samples and the set of samples with interventions;
- generating a third system of equations to determine mixing coefficients for interventions for the first, second, and third variables;
- generating a third plurality of candidate sets of intervention tuples for the first, second, and third variables by repeatedly setting each mixing coefficient in the third system of equations to zero and solving the third system of equations using the set of intervention tuples for the first and second variables and the estimated probability distributions for the first, second, and third variables; and
- selecting the set of intervention tuples for the first, second, and third variables from the third plurality of candidate sets of intervention tuples.
9. One or more computer storage media storing computer-useable instructions that, when used by a computing device, cause the computing device to perform operations, the operations comprising:
- receiving a set of baseline samples, a set of samples with interventions, and a causal graph having a plurality of variables;
- generating estimated probability distributions for a first variable from the causal graph using the set of baseline samples and the set of samples with interventions;
- generating a first system of equations to determine mixing coefficients for interventions for the first variable;
- generating a first plurality of candidate sets of intervention tuples for the first variable by repeatedly setting each mixing coefficient in the first system of equations to zero and solving the first system of equations using the estimated probability distributions for the first variable;
- selecting a set of intervention tuples for the first variable from the first plurality of candidate sets of intervention tuples; and
- assigning each sample from at least a portion of the samples with an intervention based at least in part on the set of intervention tuples for the first variable.
10. The computer storage media of claim 1, wherein the first variable is selected based on a topological ordering of the variables from the causal graph.
11. The computer storage media of claim 1, wherein generating the estimated probability distributions for the first variable includes perturbing the estimated probability distributions such that all probabilities are non-zero.
12. The computer storage media of claim 1, wherein a first mixing coefficient for a first intervention tuple in the selected set of intervention tuples is set to zero based on the mixing coefficient being below a threshold and other mixing coefficients are renormalized based on setting the first mixing coefficient to zero.
13. The computer storage media of claim 1, wherein the set of intervention tuples for the first variable is selected from the first plurality of candidate sets of intervention tuples by:
- computing an L2 norm for each candidate set of intervention tuples; and
- selecting the candidate set of intervention tuples with the lowest L2 norm.
14. The computer storage media of claim 1, wherein
- generating estimated probability distributions for the first variable and a second variable from the causal graph using the set of baseline samples and the set of samples with interventions;
- generating a second system of equations to determine mixing coefficients for interventions for the first variable and the second variable;
- generating a second plurality of candidate sets of intervention tuples for the first variable and the second variable by repeatedly setting each mixing coefficient in the second system of equations to zero and solving the second system of equations using the set of intervention tuples for the first variable and the estimated probability distributions for the first variable and the second variable; and
- selecting a set of intervention tuples for the first variable and the second variable from the second plurality of candidate sets of intervention tuples.
15. A computer system comprising:
- a processor; and
- a computer storage medium storing computer-useable instructions that, when used by the processor, causes the computer system to perform operations comprising:
- receiving, by an intervention identification module, a set of baseline samples, a set of samples with interventions, and a causal graph having a plurality of variables;
- iteratively determining, by the intervention identification module, a set of intervention tuples for N+1 variables from the causal graph until a final set of invention tuples is generated for all variables in the causal graph by: selecting N+1 variables from the causal graph by incrementing N from a previous iteration; determining a set of intervention tuples for N variables; and determining the set of invention tuples for N+1 variables using the set of intervention tuples for N variables and estimated probability distributions for N+1 variables determined using the set of baseline samples and the set of samples with interventions; and
- assigning, by an intervention assignment module, each sample from at least a portion of the set of samples with interventions to an intervention using the final set of intervention tuples.
16. The system of claim 15, wherein selection of the N+1 variables at each iteration is based on a topological ordering of the variables from the causal graph.
17. The system of claim 15, wherein determining the set of intervention tuples for a first variable at a first iteration comprises:
- generating estimated probability distributions for the first variable by marginalizing data from the set of baseline samples and the set of samples with interventions, wherein generating the estimated probability distributions for the first variable includes perturbing the estimated probability distributions such that all probabilities are non-zero;
- generating a first system of equations to determine mixing coefficients for interventions for the first variable;
- generating a first plurality of candidate sets of intervention tuples for the first variable by repeatedly setting each mixing coefficient in the first system of equations to zero and solving the first system of equations using the estimated probability distributions for the first variable; and
- selecting the set of intervention tuples for the first variable from the first plurality of candidate sets of intervention tuples.
18. The system of claim 17, wherein a first mixing coefficient for a first intervention tuple in the selected set of intervention tuples is set to zero based on the mixing coefficient being below a threshold and other mixing coefficients are renormalized based on setting the first mixing coefficient to zero.
19. The system of claim 17, wherein the set of intervention tuples for the first variable is selected from the first plurality of candidate sets of intervention tuples by:
- computing an L2 norm for each candidate set of intervention tuples; and
- selecting the candidate set of intervention tuples with the lowest L2 norm.
20. The system of claim 17, wherein determining the set of intervention tuples for the first variable and a second variable at the first iteration comprises:
- generating estimated probability distributions for the first and second variables using the set of baseline samples and the set of samples with interventions;
- generating a second system of equations to determine mixing coefficients for interventions for the first and second variables;
- generating a second plurality of candidate sets of intervention tuples for the first and second variables by repeatedly setting each mixing coefficient in the second system of equations to zero and solving the second system of equations using the set of intervention tuples for the first variable and the estimated probability distributions for the first and second variables; and
- selecting the set of intervention tuples for the first and second variables from the second plurality of candidate sets of intervention tuples.
Type: Application
Filed: Feb 14, 2022
Publication Date: Aug 17, 2023
Inventors: Gaurav Sinha (BANGALORE), Abhinav Kumar (Amdiha)
Application Number: 17/671,082