ggplot2 | tukeyedar |
---|---|
3.5.1 | 0.4.0 |
27 Exploring spread in the residuals
So far, we’ve focused on modeling the typical value of
In the univariate analysis portion of this course, we emphasized the importance of maintaining a consistent spread of residuals across groups. A uniform residual spread simplified comparisons between groups by reducing the analysis to a comparison of their means.
Similarly, ensuring a consistent spread of residuals across the full range of the independent variable in bivariate analysis is crucial. This consistency not only offers explanatory clarity but is also critical for many statistical procedures that assume homoscedasticity (constant variance) in the residuals. Violations of this assumption can compromise the validity of these methods, emphasizing the importance of carefully evaluating residual behavior during model assessment.
27.1 The spread-location plot
While inconsistency in spread across the full range of dependent variables can be sometimes observed in a residual-dependence plot, certain patterns in the data can make such an assessment more challenging in such a plot.
A spread-location plot (S-L plot) is designed to explore changes in spreads as a function of increasing
An example of a homoscedastic set of residuals follows. The plot on the left is the regression model and the plot on the right is the resulting residuals S-L plot.
Here, the residuals are constant across the full range of fitted values. This is confirmed by the loess fit which shows no significant deviation from a horizontal line.
This next example is that of a model that generates a heteroscedastic set of residuals.
The increasing spread as a function of increasing fitted value is apparent in the S-L plot (right-plot). It can also be observed in the
27.2 Variation of the S-L plot
For bivariate models, an alternative to the S-L plot is the spread-dependence (S-D) plot where the independent variable,
The heteroscedasticity in the residuals is far more pronounced when plotting the spread as a function of the independent variable.
27.3 Generating an S-L plot with eda_sl
If a regression model was generated using the base lm
function or tukeyedar
’s eda_lm
function, the resulting model can be passed to the eda_sl
function as follows:
library(tukeyedar)
<- lm(mpg ~ hp, mtcars)
M eda_sl(M)
To generate an S-D plot, set the argument type
to "dependence"
.
eda_sl(M, type = "dependence")
27.4 Generating an S-L plot with base plot or ggplot
Before generating an S-L plot using the base plotting environment or ggplot
, the spread will need to be computed from the model output.
library(ggplot2)
<- data.frame( std.res = sqrt(abs(residuals(M))),
sl2 fit = predict(M))
ggplot(sl2, aes(x = fit, y =std.res)) + geom_point() +
stat_smooth(method = "loess", se = FALSE, span = 1,
method.args = list(degree = 1) ) +
ylab(expression(sqrt(abs(residuals)))) +
xlab("Fitted values")
The function predict()
extracts the fitted y-values from the model M
and is plotted along the x-axis.