28  Parameterizing the residuals

ggplot2 tukeyedar
3.5.1 0.4.0

In the univariate section of this course, we learned that parameterizing residuals offers many benefits, including the ability to succinctly characterize spread using a single parameter, such as the standard deviation. While the standard deviation is broadly applicable, it is often strongly associated with the Normal distribution. This association arises because, in many statistical methods and contexts (e.g., hypothesis testing and confidence intervals), the assumption of normality simplifies interpretation and analysis.

28.0.1 Checking residuals for normality

If you are interested in conducting a hypothesis test (i.e. addressing the question “is the slope significantly different from 0”) you will likely want to check the residuals for normality since this is an assumption made when computing a confidence interval and a p-value.

We learned that a Normal quantile-quantile plot was well suited for this comparison. For example, using the eda_theo function, we can compare the model’s residuals to a unit Normal distribution.

library(tukeyedar)

M <- lm(mpg ~ hp, mtcars)
residuals <- M$residuals
eda_theo(residuals)

The same theoretical Q-Q plot can be generated with ggplot2 using the stat_qq and stat_qq_line functions.

library(ggplot2)

ggplot() + aes(sample = residuals) + 
  stat_qq(distribution = qnorm) +
  stat_qq_line(distribution = qnorm, col = "blue") +
  xlab("Normal") + ylab("Residuals")