Global Development: Intermediate Topics in Politics, Policy, and Data
PSCI 3200 - Spring 2024
Logistics
Assignments
Did everyone find the readings and slides for today?
For next week:
I’ll scan the chapter and upload tonight
Remember you have a quasi-assignment
Agenda
Correlation
What is it?
What is it composed of?
What is it good for?
Causation
What is it good for?
Why is it hard?
Potential outcomes and counterfactuals
Correlation
Which of the following statements describe a correlation?
Most professional data analysis took a statistics course in college.
The longer a person runs the more calories they burn.
People who live to be 100 years old typically take vitamins.
Older people vote more than younger people.
Correlations: Quantitative Comparison
Lots of bad analysis implies comparisons
Ex. 10 things that extremely successful people do to be productive
Ex. 60% of Americans now live paycheck-to-paycheck
Ex. 70% of participants reported an improvement
Avoid ‘selecting on the dependent variable’
Applies to qualitative comparisons as well
Correlations: Necessary Components
What do we need to calculate correlations?
Measures of central tendency
Mean
Measures of spread
Variance
Standard deviation
Central Tendency: Mean
\[
\mu_X = \frac{1}{n} \sum_{i}^{n} X_i
\]
my_vector =rnorm(10, mean =10, sd =5)# Step 1: Sum the valuessum_values <-sum(my_vector)# Step 2: Count the number of elementscount_elements <-length(my_vector)# Step 3: Calculate the meanmean_value <- sum_values / count_elementsprint(mean_value)
## Create new dataframe for big addition and store vector lengthb_dat = datind =length(b_dat)## Add four to the largest number in the vector and calculate size of var increaseb_dat[ind] = b_dat[ind] +4b_var =var(b_dat)val = b_var - o_varcat("Variance increases by", val )
Variance increases by 8.890842
Add a constant to a smaller number
## Create new dataframe for small additions_dat = dat## Add four to the smallest number in the vector and calculate size of var increases_dat[ind-2] = s_dat[ind-2] +4s_var =var(s_dat)val = s_var - o_varcat("Variance increases by", val )
Expected change in \(Y\) with 1-unit change in \(X\)
Measures of Correlation
What does the correlation coefficient tell you that slope doesn’t?
Consistency of the relationship on bounded scale (-1 to 1)
What does slope tell you that the correlation coefficient doesn’t?
Substantive importance (magnitude)
Give an example of when you’d prefer each
Correlation: When comparing relationships on different scales
Slope: When thinking about ROI
Correlation
What can with do with them?
Description: quantitative comparisons
sample matters alot
sample matters less
Forecasting: sample population \(\rightarrow\) out-of-sample
Causal inference: correlation + research design
Simple, but powerful
Non-linearities, interactions, machine learning
Causation
Schools of Thought
Potential outcomes and counterfactuals (Econ)
DAGs and do-calculus (CS)
Manipulability (Philosophy)
“We think of a cause as something that makes a difference, and the difference it makes must be a difference from what would have happened without it.” (Lewis, 1973)
Causality: Why bother?
Understanding cause and effect is how we change things in the real world
Causal inference separates good evaluations from bad
Policy change
Development intervention
Causal identification is not binary
It’s harder for some policies and interventions than others
Variety of tools that can help us rule out different threats to inference
Causality: Why bother?
Show code
library(ggplot2)Year =c(0,1,2,3)Outcome =c(NA, 1.2, 1.4,NA)Treatment =c("Control", "Control","Control","Control")dat =data.frame(Year, Outcome, Treatment)ggplot(data = dat, aes(x = Year, y = Outcome, group = Treatment)) +geom_line(aes(linetype=Treatment),size=2) +geom_point(size =6) +scale_linetype_manual(values=c("solid")) +xlim(0,3) +scale_y_continuous(limits =c(1,1.85), breaks =seq(1, 1.85, by = .1)) +theme(legend.position ="none", text =element_text(size=20))