Migration 1

Plus Difference-in-Differences

Jeremy Springman

University of Pennsylvania

Global Development: Intermediate Topics in Politics, Policy, and Data

PSCI 3200 - Spring 2024

Logistics

Assignments

Agenda


  1. Migration
  2. Causal Inference with Observational Data: Difference-in-Differences
  3. Using git

Migration Overview

Overview of International Migration

Economic Equilibrium


  • Push factors
  • Pull factors
  • Costs

Economic Equilibrium: Push


Push factors: origin country factors that affect the well-being

  • Demographics: youth bulge
  • Living standards: poor infrastructure, crime
  • Economic opportunities: unemployment
  • Politics: exclusion

Economic Equilibrium: Pull


Pull factors: destination country factors that affect expected well-being

  • Demographics: aging population
  • Living standards: good infrastructure, security
  • Economic opportunities: labor demand, credit
  • Politics: inclusion

Economic Equilibrium: Costs


Costs: factors that shape the costs of migrating

  • Legal restrictions
  • Transportation
  • Asset mobility

Economic Equilibrium Perspective


  • Migration decisions are based on expected return
  • Movements from poorer to richer regions will equalize wages

Migration Systems

Macro linkages between sending and receiving countries

  • Migration is driven by prior links between sending and receiving countries
    • Colonization, political or cultural influence, trade, language

Micro linkages between households

  • Migration is driven by prior links between individuals and households
    • Social ties to households in the destination country

3ie Evidence Review

Irregular Migration Evidence Review

Objective:

  • Review evidence efficacy of interventions designed to target the root causes of irregular migration

Root causes: social and political conditions that induce departures

  1. Lack of economic opportunity
  2. Lack of capacity to adopt to shocks
  3. High levels of violence
  4. Lack of regular migration channels

Irregular Migration Evidence Review


What is irregular migration?

  • Migration outside legal channels

Why is it different from regular migration?

  • Additional risks relative to legal migration (violence, exploitation, access to legal system)

Findings

Studies reporting migration outcomes are concentrated on 3 intervention categories:

  1. Human capital strengthening
  2. Active labor market policies
  3. Information campaigns

Conclusions


  • Some evidence on interventions that address the root causes of irregular migration
  • Almost no evidence looking at irregular migration as a primary outcome
  • A number of studies are ongoing

Gazeaud et al. (2023)

Constraints to migration


  • Migration causes huge income gains
  • Desire for migration is extremely high
  • Actual migration flows are relatively small
  • Why?

Intervention

Conditional cash transfer

  • Cash payment for labor on public works

Mechanisms:

  1. Liquidity
  2. Opportunity cost
  3. Collateral and access to credit
  4. Risk-aversion

Findings

  • Treatment households 38% more likely to migrate
  • Spillovers?
  • Mechanisms?

Findings

Policy Implications

  • The poorest individuals in sending countries are not the most likely to migrate (Clements and Mendola, 2020)
  • Improving welfare in sending countries will not necessarily reduce desire to migrate
  • Networks may be less influential (over short time periods and with large pre-treatment migration)

Difference-in Differences

Identification strategy

In the real world, there are always threats to inference that we can’t measure/observe or understand well enough to adjust for

  • A research design that allows us to isolate a causal effect from observational data
  • Approximates an experiment by ensuring that the treatment and control group are similar at baseline
  • These strategies rely on assumptions that we can attempt to validate

Holy Trinity of Causal Inference


  1. Difference-in-Differences
  2. Regression Discontinuity
  3. Instrumental Variables

Difference-in-Differences


\[ Y_{it} = \alpha + \beta_1 \text{Treatment}_i + \beta_2 \text{Post}_t + \gamma (\text{Treatment}_i \times \text{Post}_t) + \epsilon_{it} \]

  • \(\gamma (\text{Treatment}_i \times \text{Post}_t)\)
  • Assumes measurement at two points in time

Simulation Example

Show code
# Load required libraries
library(dplyr)
library(modelsummary)

# Generate example data
set.seed(123)
data <- data.frame(
  treatment = rep(c(1, 0), each = 100),
  post = rep(c(1, 0), each = 50, times = 2),
  outcome = c(rnorm(50, mean = 10, sd = 2), # control: pre-treatment
              rnorm(50, mean = 10, sd = 2), # control: post-treatment
              rnorm(50, mean = 10, sd = 2), # treatment: pre-treatment
              rnorm(50, mean = 12, sd = 2)) # treatment: post-treatment
)

head(data)
  treatment post   outcome
1         1    1  8.879049
2         1    1  9.539645
3         1    1 13.117417
4         1    1 10.141017
5         1    1 10.258575
6         1    1 13.430130

Simulation Example

Show code
# Run difference-in-differences model
did_model <- lm(outcome ~ treatment * post, data = data)

# Summarize the output
modelsummary(
  list(lm(outcome ~ treatment + post, data = data), lm(outcome ~ treatment * post, data = data)),
  estimate  = "{estimate}{stars} ({std.error})",
             statistic = NULL,
  gof_omit = 'IC|RMSE|Log|F|R2$|Std.')
tinytable_1iccutlt9f4zwfrbr1b5
(1) (2)
(Intercept) 11.487*** (0.241) 12.078*** (0.265)
treatment -0.604* (0.278) -1.785*** (0.375)
post -1.405*** (0.278) -2.585*** (0.375)
treatment × post 2.361*** (0.531)
Num.Obs. 200 200
R2 Adj. 0.124 0.201

DiD: Assumptions

  • Treatment and control units would have changed in similar ways
    • Parallel trends
  • Requires at least 3 observation periods

Why can’t we just observe how units change over time?

Show code
library(ggplot2)

Year = c(0,1,2,3)
Outcome = c(NA, 1.3, 1.7, NA)
Treatment = c("Treatment", "Treatment","Treatment","Treatment")

dat = data.frame(Year, Outcome, Treatment)

ggplot(data = dat, aes(x = Year, y = Outcome, group = Treatment, color = Treatment)) +
  geom_line(aes(linetype=Treatment),size=2) +
  geom_point(size = 6) +
  xlim(0,3) + 
  ylim(0.8, 2.2) +
  scale_linetype_manual(values=c("solid", "solid")) +
  scale_color_manual(values = c("blue") ) +
  theme(legend.position = c(0.8, 0.2), text = element_text(size=20),
        legend.title=element_blank())

Why can’t we just observe how units change over time?

Show code
Year = c(0,1,2,3)
Outcome = c(0.9, 1.3, 1.7, 2.1)
Treatment = c("Treatment", "Treatment","Treatment","Treatment")

dat = data.frame(Year, Outcome, Treatment)

ggplot(data = dat, aes(x = Year, y = Outcome, group = Treatment, color = Treatment)) +
  geom_line(aes(linetype=Treatment),size=2) +
  geom_point(size = 6) +
  xlim(0,3) + 
  ylim(0.8, 2.2) +
  scale_linetype_manual(values=c("solid", "solid")) +
  scale_color_manual(values = c("blue") ) +
  theme(legend.position = c(0.8, 0.2), text = element_text(size=20),
        legend.title=element_blank())

Why can’t we just compare units without randomization?

Show code
Year = c(0,1,2,3)
Outcome = c(NA, 1.2, 1.4, NA, 
            NA, 1.3, 1.7, NA)
Treatment = c("Control", "Control","Control","Control", 
              "Treatment", "Treatment", "Treatment", "Treatment")

dat = data.frame(Year, Outcome, Treatment)
dat$Treatment = factor(dat$Treatment, levels = c("Treatment", "Control"))

ggplot(data = dat, aes(x = Year, y = Outcome, group = Treatment, color = Treatment)) +
  geom_line(aes(linetype=Treatment),size=2) +
  geom_point(size = 6) +
  xlim(0,3) + 
  ylim(0.8, 2.2) +
  scale_linetype_manual(values=c("solid", "solid")) +
  scale_color_manual(values = c("blue", "red") ) +
  theme(legend.position = c(0.8, 0.2), text = element_text(size=20),
        legend.title=element_blank())

Why can’t we just compare units without randomization?

Show code
Year = c(0,1,2,3)
Outcome = c(1, 1.2, 1.4, 1.6, 
            0.9, 1.3, 1.7, 2.1)
Treatment = c("Control", "Control","Control","Control", 
              "Treatment", "Treatment", "Treatment", "Treatment")

dat = data.frame(Year, Outcome, Treatment)
dat$Treatment = factor(dat$Treatment, levels = c("Treatment", "Control"))


ggplot(data = dat, aes(x = Year, y = Outcome, group = Treatment, color = Treatment)) +
  geom_line(aes(linetype=Treatment),size=2) +
  geom_point(size = 6) +
  xlim(0,3) + 
  ylim(0.8, 2.2) +
  scale_linetype_manual(values=c("solid", "solid")) +
  scale_color_manual(values = c("blue", "red") ) +
  theme(legend.position = c(0.8, 0.2), text = element_text(size=20),
        legend.title=element_blank())

Why can’t we just compare units without randomization?

Show code
Year = c(0,1,2,3)
Outcome = c(NA, 1.3, 1.5, NA,
            1, 1.2, 1.4, NA,
            1.1, 1.3, 1.7, NA)
Treatment = c("Comparison","Comparison","Comparison","Comparison",
              "Control", "Control","Control","Control",
              "Treatment", "Treatment", "Treatment", "Treatment")

dat = data.frame(Year, Outcome, Treatment)
dat$Treatment = factor(dat$Treatment, levels = c("Treatment", "Comparison", "Control"))

ggplot(data = dat, aes(x = Year, y = Outcome,  color = Treatment)) +
  geom_line(aes(linetype=Treatment),size=2) +
  geom_point(size = 6) +
  ylim(0.8, 2.2) +
  scale_linetype_manual(values=c("solid", "dotted", "solid")) +
  scale_color_manual(values = c("blue", "black", "red"  ) ) +
  theme(legend.position = c(0.8, 0.2), text = element_text(size=20),
        legend.title=element_blank())

Why can’t we just compare units without randomization?

Show code
Year = c(0,1,2,3)
Outcome = c(NA, 1.3, 1.5, 1.7,
            1, 1.2, 1.4,1.6,
            1.1, 1.3, 1.7, 1.9)
Treatment = c("Comparison","Comparison","Comparison","Comparison",
              "Control", "Control","Control","Control",
              "Treatment", "Treatment", "Treatment", "Treatment")

dat = data.frame(Year, Outcome, Treatment)
dat$Treatment = factor(dat$Treatment, levels = c("Treatment", "Comparison", "Control"))

ggplot(data = dat, aes(x = Year, y = Outcome,  color = Treatment)) +
  geom_line(aes(linetype=Treatment),size=2) +
  geom_point(size = 6) +
  ylim(0.8, 2.2) +
  scale_linetype_manual(values=c("solid", "dotted", "solid")) +
  scale_color_manual(values = c("blue", "black", "red"  ) ) +
  theme(legend.position = c(0.8, 0.2), text = element_text(size=20),
        legend.title=element_blank())

Sviatschi (2022)


  • What is the effect of mass deportations on the root causes of migration?

Three sources of variation


  1. Geography (birth municipality)
  2. Time (policy change)
  3. Age (recruitment age)

Identifying assumption


  • Municipalities where gang-deportees were born would have changed in similar ways to other municipalities in the absence of a policy change

Findings

Policy Implications


  • Addressing pull factors can have unintended affects on push factors in complex ways

Threats to Inference


  • Confounders
  • Colliders
  • Mechanisms
  • Reverse Causality

Adjusting on Observables


  • Matching
  • Weighting
  • Synthetic Control (very fancy weighting)

Validity

  • Internal validity
  • External validity
  • What are the trade-offs between experiments and observational studies?
    • Experiments have more internal validity
    • But… they often have synthetic treatments, convenience samples
  • Where are these studies used in the real-world?