Causal Inference 3

Observational Data

Jeremy Springman

University of Pennsylvania

Global Development: Intermediate Topics in Politics, Policy, and Data

PSCI 3200 - Spring 2025

Logistics

Today
- DSS Ch 5
- Create a git repo for this class (psci3200_yourname)
Monday
- Migration readings (will post before Monday)
- Git repo workshop (semi-optional)

\[ Y_i = \alpha + \beta X_i + \epsilon_i \]

Estimating model parameters

\[ \hat{Y_i} = \hat{\alpha} + \hat{\beta} X_i \] Coefficient \[ \hat{\beta} = \Delta{\hat{Y}} / \Delta{X} \]

What are residuals

\[ \hat{\epsilon_i} = Y_i - \hat{Y_i} \]

How do we minimize them?

\[ SSR = \sum_{i}^{N} \hat{\epsilon}_i^2 \]

You must control for…
- everything (observed and unobserved) that affects both the treatment variable and the outcome variable
You must not control for…
- anything that is affected by both the treatment variable and the outcome variable
You need to think carefully before controlling for…
- anything that is affected by the treatment variable that also affects the outcome variable

\[ Y_i = \alpha + \beta_1 X_{i1} + \beta_2 X_{i2} + \epsilon_i \]

In the real world, there are always threats to inference that we can’t measure/observe or understand well enough to adjust for

A research design that allows us to isolate a causal effect from observational data
Approximates an experiment by ensuring that the treatment and control group are similar at baseline
These strategies rely on assumptions that we can attempt to validate

Internal validity
External validity
What are the trade-offs between experiments and observational studies?
- Experiments have more internal validity
- But… they often have synthetic treatments, convenience samples
Where are these studies used in the real-world?