Causal Inference 3

Observational Data

Author
Affiliation

Jeremy Springman

University of Pennsylvania

Logistics

Assignments

  • Today
    • DSS Ch 5
    • Create a git repo for this class (psci3200_yourname)
  • Monday
    • Migration readings (will post before Monday)
    • Git repo workshop (semi-optional)

Agenda


  1. Review Linear Regression
  2. Causal Inference with Observational Data (pt. 1)
  3. Workshop
  4. Causal Inference with Observational Data (pt. 2)

Linear Regression

Linear Regression Model


\[ Y_i = \alpha + \beta X_i + \epsilon_i \]

Linear Regression Model


Estimating model parameters

\[ \hat{Y_i} = \hat{\alpha} + \hat{\beta} X_i \] Coefficient \[ \hat{\beta} = \Delta{\hat{Y}} / \Delta{X} \]

Minimizing the Residuals


What are residuals

\[ \hat{\epsilon_i} = Y_i - \hat{Y_i} \]

How do we minimize them?

\[ SSR = \sum_{i}^{N} \hat{\epsilon}_i^2 \]

Casaul Inference 3 (pt. 1)

Causality without Randomization

  • You must control for…
    • everything (observed and unobserved) that affects both the treatment variable and the outcome variable
  • You must not control for…
    • anything that is affected by both the treatment variable and the outcome variable
  • You need to think carefully before controlling for…
    • anything that is affected by the treatment variable that also affects the outcome variable

Multiple Regression


\[ Y_i = \alpha + \beta_1 X_{i1} + \beta_2 X_{i2} + \epsilon_i \]

  • How does our interpretation of \(\alpha\) change?
  • How does our interpretation of \(\beta_1\) change?

Threats to Inference


  • Confounders
  • Colliders
  • Mechanisms
  • Reverse Causality

Workshop

Casaul Inference 3 (pt. 2)

Identification strategy

In the real world, there are always threats to inference that we can’t measure/observe or understand well enough to adjust for

  • A research design that allows us to isolate a causal effect from observational data
  • Approximates an experiment by ensuring that the treatment and control group are similar at baseline
  • These strategies rely on assumptions that we can attempt to validate

Holy Trinity of Causal Inference


  1. Difference-in-Differences
  2. Regression Discontinuity
  3. Instrumental Variables

Validity

  • Internal validity
  • External validity
  • What are the trade-offs between experiments and observational studies?
    • Experiments have more internal validity
    • But… they often have synthetic treatments, convenience samples
  • Where are these studies used in the real-world?

Adjusting on Observables


  • Matching
  • Weighting
  • Synthetic Control (very fancy weighting)