Final Project Overview
Final Project Overview
Objectives
The final project is the culmination of this course
- Reflect the substantive knowledge and technical skills that you developed
- Provide experience
- Generating a research question
- Refining into a testable hypothesis
- Identifying data to provide evidence for or against
- Communicating the results of your analysis to others
Objectives
- Help you answer research questions in the future
- Assess the strength of evidence presented by others
- Incorporate research into your professional decision-making
Requirements
- Research project with data of your choosing
- Formulate a research question
- Find data that can help you answer that question
- Apply the tools and methods from this course
- Write-up analysis
- Produce a webpage to present your results for public consumption
Write-up
- Introduction to research question and data
- Discussion of research design, assumptions, and threats to inference
- Visualization describing your data
- Presentation of results from a regression model and discussion of implications for research question
- Robustness test to probe assumptions or empirical extension
- Discussion of policy implications
- Appendix including supplementary materials
Final Project Touchpoints
Milestone | Due Date |
---|---|
Create a GitHub repository | Feb 17 |
Research Question and Data | Feb 19 |
Research Design | Mar 5 |
Submit proposal | Apr 2 |
Submit final project | May 10 |
Examples
Assignment 1: Research Question and Data
Research Question and Data
Must be submitted via Slack by 11:59pm EST on Monday, February 19
- Sketch a research question that you’d like to investigate
- Identify data that can answer that question
- I will provide feedback on the viability of the questions, the suitability of the data, and the extent to which your general idea will meet my expectations for the final project
Research Question and Data
Send me a quarto html file that:
- Briefly describes your idea for a research question
- 3-4 sentences describing some relationship in the world that you want to investigate
- This should involve at least two things in the world that can be measured with existing data
- You may submit more than 1 idea.
- Proposes data and measures that will help you answer it
- Specific, existing dataset that you can access
- Specific variables that will be used to answer the research question
Types of data
There are many types of data in the world. Below is a brief discussion of the most common sources of data in the social sciences.
- Election returns
- Compilations of election data, such as Constituency-Level Elections Archive (CLEA)
- Returns for specific elections are often available from a country’s electoral commission website
Types of data
- Replication data
- Any published research from the last 5-10 years should make the data and analysis files publicly available. You can almost always find where these replication materials are hosted on the article’s webpage at whichever journal puslished the article. Oftentimes, these data are hosted on Harvard’s Dataverse.
Types of data
- Survey data
- Survey data is used extremely heavily on the social sciences. Most prominently in political science are the various ‘barometer’ surveys (Afrobarometer, Latinobarometer, etc.).
- Administrative data
- Data on government (or organization) programs or
- Expert-coded data
- Data where experts code the characteristics of countries or political entities (such as parties)
Popular Public Datasets
- Varieties of Democracy
- World Bank Development/Governance Indicators
- Armed Conflict Location & Event Data Project (ACLED)
- AidData
- Demographic and Health Survey
My Datasets
- Machine Learning for Peace
- This data captures the volume of reporting from high-quality, local news sources on 42 distinct political events from 2012-2023 for 60 aid-receiving countries.
- Focuses on events that constitute changes in civic space, such as censorship, legal changes, and other forms of repression
- Cambodian NGOs (n \(\approx\) 100)
- Convenience sample
- Panel survey, financial data, networks, open-ended responses
My Datasets
- Ethiopian University Students (n \(\approx\) 900)
- Representative sample
- Panel survey, networks, open-ended responses
- Ghanaian Radio Stations (n \(\approx\) 400)
- Convenience sample
Assignment 2: Research Design
- Describe your research question and provide some background on why you find it interesting or important. Be sure to incorporate the feedback provided on the first assignment. Include references to at least two pieces of existing research that illustrate what has already been discovered about the relationship you are investigating. This could be articles published in academic journals, policy reports and white papers, articles published by think tanks and other credible organizations (ex. Brookings, the United Nations High Commissioner for Refugees, etc.), or pieces of data journalism. A full-credit response will probably be around 150-200 words. (40%)
- State at least one testable hypothesis. Distill your research question into a statement about a specific relationship you expect to see in the world. In most cases, this hypothesis will describe a causal relationship between two variables (i.e. changes in x cause changes in y). Make an argument for why you expect to see this relationship. This should be based on related findings from existing research and/or your own theoretical/logical reasoning. A full-credit response will probably be around 150-200 words. (25%)
- Briefly discuss the specific variables you will use to test your hypothesis and the dataset they are drawn from. Be sure to mention the source of the data (the organization or individuals that produced it), the unit of analysis, and the sample. Create a
ggplot
to visualize the relationship between the variables being used to test your hypothesis. The best method of visualizing this relationship will depend on the scale of your variables. Good methods to consider are scatter plots (for continuous variables), grouped bar charts (for ordinal or binary variables), and line graphs (for continuous variables with time-series). (25%) - Specify the main regression model you will use to test your hypothesis. This regression model should provide a preliminary test for or against the validity of your hypothesis. Use markdown to render the equation neatly to the html file. (10%)