Proposal

Due: Friday, September 22 11:59 PM

Purpose

The purpose of the team project is to provide an opportunity to complete a data analysis project from start to finish. This includes:

  • Selecting data

  • Writing research questions

  • Performing exploratory data analysis

  • Developing appropriate statistical models

  • Communicating results through a written report and presentation

  • Working with a team

Teams

See this sheet for your team assignment. You are responsible for dividing the work equitably among team members. Every team member is required to contribute to each portion of the project: coding, writing, and presenting. Learn from each other and, most of all, be kind. You will receive individual grades for the project that incorporate feedback from fellow team members.

Data

Your first step is to select the dataset that you are most interested in analyzing. Below is a list of datasets or sources that you can consider using to find a dataset. You are not require to use these.

Your dataset must have at least 500 observations, 10 variables, and a mix of numeric and categorical variables. You may not use a dataset that we have used in class.

  1. R Data Sources for Regression Analysis

  2. FiveThirtyEight data

  3. World Health Organization

  4. The National Bureau of Economic Research

  5. International Monetary Fund

  6. General Social Survey

  7. United Nations Data

  8. Pew Research

  9. 2021 CDC Behavioral Risk Factor Surveillance System Survey

  10. World Inequality Database

You are welcome to specify a subset of a particular dataset (e.g., only North Carolina in the CDC BRFSS). Be sure to look at the data to understand its scope.

Proposal

Your proposal should list the following:

  1. Team member names
  2. 2 or 3 datasets of interest (list in order of preference with 1 being the top choice). For each dataset, provide:
  • The source of the data, when and how it was originally collected, and a brief description of the observations. This information is likely found on the website where you found the data, so be sure to cite your source

  • TWO research questions you are interested in exploring for each dataset. Explicitly state the outcome variables in the dataset that you will use to answer each question. You are required to use two different types of outcome variables for your research questions. For example, one research question may use a continuous outcome variable while the other uses an ordinal outcome variable. Variable types may include: continuous, binary, ordinal, nominal, time-to-event

  • a glimpse() of each dataset to show that you can access it in R

Consult this helpful guide for writing a good research question

No data cleaning is required in the proposal. Your proposal should be generated with a Quarto document, but you will not be required to submit the code.

Submit one proposal per group. One person will submit and select the other group members in the Gradescope submission. After you submit, I will provide feedback to help you decide which dataset to choose for your project.