ggplot2 | tukeyedar |
---|---|
3.5.1 | 0.4.0 |
28 Parameterizing the residuals
In the univariate section of this course, we learned that parameterizing residuals offers many benefits, including the ability to succinctly characterize spread using a single parameter, such as the standard deviation. While the standard deviation is broadly applicable, it is often strongly associated with the Normal distribution. This association arises because, in many statistical methods and contexts (e.g., hypothesis testing and confidence intervals), the assumption of normality simplifies interpretation and analysis.
28.0.1 Checking residuals for normality
If you are interested in conducting a hypothesis test (i.e. addressing the question “is the slope significantly different from 0”) you will likely want to check the residuals for normality since this is an assumption made when computing a confidence interval and a p-value.
We learned that a Normal quantile-quantile plot was well suited for this comparison. For example, using the eda_theo
function, we can compare the model’s residuals to a unit Normal distribution.
library(tukeyedar)
<- lm(mpg ~ hp, mtcars)
M <- M$residuals
residuals eda_theo(residuals)
The same theoretical Q-Q plot can be generated with ggplot2
using the stat_qq
and stat_qq_line
functions.
library(ggplot2)
ggplot() + aes(sample = residuals) +
stat_qq(distribution = qnorm) +
stat_qq_line(distribution = qnorm, col = "blue") +
xlab("Normal") + ylab("Residuals")