Sampling and Population Characteristics

Descriptive Inference

Jeremy Springman

University of Pennsylvania

Global Development: Intermediate Topics in Politics, Policy, and Data

PSCI 3200 - Spring 2024

Logistics

Assignments

  • Today
    • Read Ch 3
    • Install Quarto and create an empty html
  • Thursday
    • Read Ch 4
    • Create a git repo for this class (psci3200_yourname)

Agenda

  1. Finishing up Causality 1
  2. Inferring Population Characteristics via Survey Research
  3. Messing around in Quarto
  4. Final Project

Finishing up Causality 1

Identifying Assumptions

  • You must control for…
    • everything (observed and unobserved) that affects both the treatment variable and the outcome variable
  • You must not control for…
    • anything that is affected by both the treatment variable and the outcome variable

Colliders

Colliders

Plumbing vs Science

  • Scientists can tell us the drivers of human welfare and prosperity
    • Can we manipulate these macro-forces? Probably not.
  • So what can we do as social scientists?
    • Help at the margins in specific places with specific policies and programs
  • Any normatively uncomfortable findings?

Inferring Population Characteristics via Survey Research

Survey Research

  • We often want to describe a population of interest
    • Examples?
    • How is this different from correlation?
  • We usually can’t collect data on the entire population
  • Sampling allows us to estimate population characteristics from a subset of the population
  • Random sampling makes it most likely that our sample is representative of the population

Getting a Representative Samples

  • Sampling frame
    • Comprehensive list of units
    • Ideally: Census, administrative records
    • Developing contexts: create this yourself on-the-ground
  • Non-response
    • Selection into sample can bias estimates, induce spurious relationships
    • Think about collider example

Dealing with Missingness

  • Unit non-response
    • Incentives
    • Fancy weighting
  • Item non-response
    • Self-administration
  • Misreporting
    • List experiments

Dealing with Missingness

  • Always look at NAs immediately
  • Always think about how this might bias estimates

Sampling Strategies

  • Probability sampling strategies:
    • Simple random sampling
    • Stratified sampling
    • Cluster sampling
  • Non-Probability sampling strategies:
    • Convenience sample

Simple Random Sampling

  • Each unit within the population has the same probability of being chosen
  • As size increases, the sample more closely resemble the population
  • Assumes that unit and item non-response are effectively random

Stratified sampling

  • Divide the entire population into homogeneous strata
  • Take random samples from each stratum
    • Decreases risk of imbalance
    • Ex. Race, gender, partisanship
  • Oversampling
    • Allows for more precise estimates of small group characteristics
    • Must adjust for population estimates

Cluster sampling

  • Divide population into heterogeneous clusters
  • Take random sample of clusters
    • Decreases costs of sampling
    • Ex. Districts, schools, businesses

Final Project Discussion

Final Project

  • Data analysis project with data of your choosing
    • Formulate a research question
    • Find data that can help you answer that question
    • Apply the tools and methods from this course
    • Write-up analysis
  • Produce a webpage to present your results for public consumption

Write-up

  1. Introduction to research question and data
  2. Discussion of research design, assumptions, and threats to inference
  3. Visualization describing your data
  4. Presentation of results from a regression model and discussion of implications for research question
  5. Discussion of policy implications

Public Datasets

  • Varieties of Democracy
  • Machine Learning for Peace
  • Afrobarometer, Arab Barometer
  • Armed Conflict Location & Event Data Project (ACLED)
  • AidData

My Datasets

  • Cambodian NGOs (n \(\approx\) 100)
    • Convenience sample
    • Panel survey, financial data, networks, open-ended responses
  • Ethiopian University Students (n \(\approx\) 900)
    • Representative sample
    • Panel survey, networks, open-ended responses
  • Ghanaian Radio Stations (n \(\approx\) 400)
    • Convenience sample

DevLab Datasets

  • Too many to recall, but inquire with me and I will ask around

Final Project


Milestone Due Date
Create a GitHub repository Feb 8th
Identify data source Mar 5th
Submit proposal Apr 2nd
Submit final project May 2nd

RStudio and Quarto

Poll

  • Mac vs Windows?
  • Quarto running?

Instructions

  • Follow-along as I create a quarto page
  • Submit the html for the page to me via Slack before the end of class