Final Project Overview

Author
Affiliation

Jeremy Springman

University of Pennsylvania

Final Project Overview

Objectives

The final project is the culmination of this course

  • Reflect the substantive knowledge and technical skills that you developed
  • Provide experience
    • Generating a research question
    • Refining into a testable hypothesis
    • Identifying data to provide evidence for or against
    • Communicating the results of your analysis to others

Objectives

  • Help you answer research questions in the future
  • Assess the strength of evidence presented by others
  • Incorporate research into your professional decision-making

Requirements

  • Research project with data of your choosing
    • Formulate a research question
    • Find data that can help you answer that question
    • Apply the tools and methods from this course
    • Write-up analysis
  • Produce a webpage to present your results for public consumption

Write-up

  1. Introduction to research question and data
  2. Discussion of research design, assumptions, and threats to inference
  3. Visualization describing your data
  4. Presentation of results from a regression model and discussion of implications for research question
  5. Robustness test to probe assumptions or empirical extension
  6. Discussion of policy implications
  7. Appendix including supplementary materials

Final Project Touchpoints


Milestone Due Date
Create a GitHub repository Feb 17
Research Question and Data Feb 19
Research Design Mar 5
Submit proposal Apr 2
Submit final project May 10

Examples

Assignment 1: Research Question and Data

Research Question and Data

Must be submitted via Slack by 11:59pm EST on Monday, February 19

  • Sketch a research question that you’d like to investigate
  • Identify data that can answer that question
  • I will provide feedback on the viability of the questions, the suitability of the data, and the extent to which your general idea will meet my expectations for the final project

Research Question and Data

Send me a quarto html file that:

  • Briefly describes your idea for a research question
    • 3-4 sentences describing some relationship in the world that you want to investigate
    • This should involve at least two things in the world that can be measured with existing data
    • You may submit more than 1 idea.
  • Proposes data and measures that will help you answer it
    • Specific, existing dataset that you can access
    • Specific variables that will be used to answer the research question

Types of data

There are many types of data in the world. Below is a brief discussion of the most common sources of data in the social sciences.

Types of data

  • Replication data
    • Any published research from the last 5-10 years should make the data and analysis files publicly available. You can almost always find where these replication materials are hosted on the article’s webpage at whichever journal puslished the article. Oftentimes, these data are hosted on Harvard’s Dataverse.

Types of data

  • Survey data
    • Survey data is used extremely heavily on the social sciences. Most prominently in political science are the various ‘barometer’ surveys (Afrobarometer, Latinobarometer, etc.).
  • Administrative data
    • Data on government (or organization) programs or
  • Expert-coded data
    • Data where experts code the characteristics of countries or political entities (such as parties)

My Datasets

  • Machine Learning for Peace
    • This data captures the volume of reporting from high-quality, local news sources on 42 distinct political events from 2012-2023 for 60 aid-receiving countries.
    • Focuses on events that constitute changes in civic space, such as censorship, legal changes, and other forms of repression
  • Cambodian NGOs (n \(\approx\) 100)
    • Convenience sample
    • Panel survey, financial data, networks, open-ended responses

My Datasets

  • Ethiopian University Students (n \(\approx\) 900)
    • Representative sample
    • Panel survey, networks, open-ended responses
  • Ghanaian Radio Stations (n \(\approx\) 400)
    • Convenience sample

Assignment 2: Research Design

  1. Describe your research question and provide some background on why you find it interesting or important. Be sure to incorporate the feedback provided on the first assignment. Include references to at least two pieces of existing research that illustrate what has already been discovered about the relationship you are investigating. This could be articles published in academic journals, policy reports and white papers, articles published by think tanks and other credible organizations (ex. Brookings, the United Nations High Commissioner for Refugees, etc.), or pieces of data journalism. A full-credit response will probably be around 150-200 words. (40%)
  2. State at least one testable hypothesis. Distill your research question into a statement about a specific relationship you expect to see in the world. In most cases, this hypothesis will describe a causal relationship between two variables (i.e. changes in x cause changes in y). Make an argument for why you expect to see this relationship. This should be based on related findings from existing research and/or your own theoretical/logical reasoning. A full-credit response will probably be around 150-200 words. (25%)
  3. Briefly discuss the specific variables you will use to test your hypothesis and the dataset they are drawn from. Be sure to mention the source of the data (the organization or individuals that produced it), the unit of analysis, and the sample. Create a ggplot to visualize the relationship between the variables being used to test your hypothesis. The best method of visualizing this relationship will depend on the scale of your variables. Good methods to consider are scatter plots (for continuous variables), grouped bar charts (for ordinal or binary variables), and line graphs (for continuous variables with time-series). (25%)
  4. Specify the main regression model you will use to test your hypothesis. This regression model should provide a preliminary test for or against the validity of your hypothesis. Use markdown to render the equation neatly to the html file. (10%)