Expected change in \(Y\) with 1-unit change in \(X\)
Predict
Population growth
Future resource availability
Winner of an election!
Expected turnout rate by age group
Predict
To be able to use X to predict Y we need…
both variables to have a relationship.
We then can use statistical models to summarizethe relationship.
And use the models to predict Y, given any X
Give me some examples of statistical models often used to summarize relationships.
Measure vs Predict
# Set up some fake dataset.seed(390)# Predictorx <-rnorm(100, mean =-3, sd =1)# Noiseerror <-rnorm(100, mean =0)# Variable we want to explainy<- x + error # Measure the relationshipcov(x, y) # covariance
[1] 1.105927
cor(x, y) # correlation
[1] 0.7318479
# or cov(x, y) / (sd(x)*sd(y))mod <-lm(y ~ x)coefficients(mod)[2]
x
1.098033
Measure vs Predict
# To predict we need the full modelcoefficients(mod)
(Intercept) x
0.3085034 1.0980332
# or cov(x, y) / var(x)hypothetical_x =10# predict using the ENTIRE modelcoefficients(mod)[[1]] +coefficients(mod)[[2]]*hypothetical_x
[1] 11.28884
Measuring, predicting, and … explaining?
We can measure relationships
We can summarize relationships in models and use them to predict
For a variable X to be a good predictor of Y, X and Y need to have an association.
But predictively powerful models tell us nothing about why X and Y are associated.
Causality as a way to explain
When we are interested in explaining, we might want to know if X is just associated with Y or if X causes Y
The “causal effect of X on Y” is the change in the outcome Y produced by a change in the treatment X
Does exposure to misinformation cause polarization?
What is X? What is Y?
Causality as a way to explain
Broadly, two type of causal questions:
Causes of consequences, e.g: Does democracy cause economic development?
Consequences of causes: What is the causal effect of colonialism on the economic development of east African countries?
Which do you think is more common in policy work?
Causality: the FPOCI
To answer “does democracy cause economic development?” we need to:
compare the economic development of democratic countries with
the economic development of those same democratic countries had they not been democracies.
Causality: the FPOCI
To answer “does democracy cause economic development?” we need to:
compare the economic development of democratic countries with
the economic development of those same democratic countries had they not been democracies.
We can NEVER do this
We call this problem the fundamental problem of causal inference
Causality
One way to think about causality from this angle more formally is using the Potential Outcomes framework
Imagine n participants (indexed by i) in an experiment to test a new blood pressure medicine (X).
Participants can either take (X=1) or not take (X=0) the medicine.
Causality
Each participant, \(i\) has two potential health outcomes:
\(Y_i(X_i = 0)\) = Person i’s health outcome if they do not take the medicine and
\(Y_i(X_i = 1)\) = Person i’s health outcome if they take the medicine
Participant
Medicine? (X)
Observed BP
Yi(0)
Yi(1)
\(Y_i(1) - Y_i(0)\)
1
1
121
125
121
-4
2
1
140
145
140
-5
3
0
120
120
119
-1
Causality
We never can observe \(Y_i\) when i took the medicine and when they did not!
We only observe the factual outcome, never the counterfactual outcome.
Participant
Medicine? (X)
Observed BP
Yi(0)
Yi(1)
1
1
121
???
121
2
1
140
???
140
3
0
120
120
???
Causality: What to do?
What if we think in terms of average causal effects instead?
What if we could plug-in the mean Y value of observations assigned to the treatment as \(\overline{Y(1)}\) and the control observations as \(\overline{Y(0)}\)?
Well our problems would be solved! At least on average.
What would we need to assume about observations in treatment and control?
Causality: on average
What if we could plug-in the mean Y value of observations assigned to the treatment as \(\overline{Y(1)}\) and the control observations as \(\overline{Y(0)}\)?
Well our problems would be solved! At least on average.
What would we need to assume about observations in treatment and control?
That these two populations are, on average, the same in every relevant dimension.
Causality: on average
If X (receiving treatment) is independent of Y (outcome), then