Hypothesis testing is the procedure that assesses the evidence from the data in favor of or against some claims (hypotheses) about the population.
We will use information from the sample (data) to draw conclusions about the population
Usually we have two hypotheses to test:
Null (\(H_0\)): Formally, the null hypothesis makes a claim or assumption about a population parameter. Conceptually, it’s the “nothing special is going on” hypothesis.
Alternative (\(H_A\)): A statement that is contradictory to the null hypothesis. Conceptually, it is the “something special is going on” or “the variables are associated” hypothesis.
As we test the hypotheses, do you think it would be better scientific practice to operate under the assumption that the null hypothesis is true (i.e., nothing special is going on) or that the alternative hypothesis is true (something special is going on/the variables are related)?
For example, imagine you work for the FDA and you are assessing whether or not a new drug should be approved. Do you think it is better practice to 1) assume that the drug is not effective until the data prove otherwise, or 2) assume that the drug is effective until the data prove otherwise?
The probability of observing the data or even more extreme values assuming that \(H_0\) is true.
Which region of the plot below constitutes the p-value?
A predetermined threshold used to tell whether a result is statistically significant; usually denoted by \(\alpha\) and set as 0.05.
Note: We can NEVER say we accept the null hypothesis, even if the p-value is greater than the significance level.
Simulation-based inference
Simulate the null distribution with resampling methods
Very flexible, but requires computation power
Parametric Inference
Make assumptions about the probability distribution of our population
Relies on statistical theory such as central limit theorem
Simulation-based inference can be intuitive, and is more flexible
Parametric inference has been around for a long time and you will likely see it more in practice