Start with a research question
Based on the research question, we can identify the population of interest.
Often, it is unrealistic to collect data on the entire population, so we collect a sample.
But how?
Say I want five of you to come to the front of the class. How could I choose five people?
Gold standard, but not always practical
Many statistical methods assume simple random sampling
Useful when the sizes of the strata differ from each other
Useful when the cases in each stratum are very similar with respect to the outcome
Can be more economical than SRS or stratified sampling
Most helpful when there is case-to-case variability within a cluster, but there is not much variability among the clusters themselves
RQ: What is the mean salary for all MLB players?
Broadly speaking, there are two kinds of studies:
Under which design would you be more comfortable drawing causal conclusions?
A confounding variable is one that is associated with both the explanatory and response variables
Letβs try the sample
function:
Letβs try it with our survey data from last week:
In your group, discuss how you can fill in the blanks to create a cluster sample in which you randomly select 2 values of the Interest
variable