35  Slicing Data: Exploring Discontinuities and Local Models

dplyr ggplot2 tidyr
1.1.4 3.5.2 1.3.1

So far, we’ve assumed that the underlying process that relates the \(X\) and \(Y\) variables is homogeneous across the full range of \(X\) values. This assumption simplifies model fitting strategies. But sometimes, that assumption does not hold.

When this assumption fails, a single model may obscure important structure in the data. In such cases, slicing the data into segments-each governed by its own model-can reveal patterns that would otherwise remain hidden.


35.1 A synthetic example: Visualizing breaks in linearity

We’ll begin by fitting a simple linear model to a synthetic dataset.

The fitted line captures the overall trend reasonably well, but the relationship doesn’t appear to be strictly linear. To investigate further, we’ll examine the residuals using a residual-dependence plot.

The residual plot reveals a dip between \(x \approx 95\) and \(x \approx 107\) followed by an upward trend. These kinks suggest that the data may be better modeled using three distinct linear segments, each with its own slope and intercept. The small loess span helps highlight these local deviations.

To explore this further, we divide the data into three groups based on the apparent breakpoints: \(x < 95\), \(95 \le x \lt 106\) and \(x \ge 106\). We’ll label these groups 1, 2, and 3.

The faceted plots confirm our earlier suspicion: each segment appears to follow a distinct linear trend.

This example illustrates how a single global model may obscure important structure in the data. By examining residuals and fitting local models, we can uncover distinct regimes that reflect different underlying processes.

In the next section, we’ll see how this approach applies to real-world data where the breaks may not be as visually obvious.

35.3 Summary

In earlier chapters, we assumed that a single model could describe the relationship between two variables across the entire range of data. This chapter challenged that assumption by introducing the concept of slicing-dividing the data into segments where different models may apply.

Using residual plots and loess fits, we saw how structural breaks, or changepoints, can be visually detected. These breaks often indicate that the data may be governed by different processes in different regions of the independent variable.

The synthetic example demonstrated how residual patterns can reveal hidden linear segments, while the temperature case study illustrated how domain knowledge can help interpret such patterns meaningfully.

Ultimately, slicing is a powerful exploratory tool that helps uncover non-monotonic trends, regime shifts, and localized behaviors that a single global model might obscure.

35.4 Reference

Original paper: Vincent, L. A., et al., 2005. Observed trends in indices of daily temperature extremes in South America 1960–2000. J. Climate, 18, 5011–5023.

Comment to the paper Stone, R. J., 2011. Comments on “Observed trends in indices of daily temperature extremes in South America 1960–2000.” J. Climate, 24, 2880–2883.

The reply to the comment Vincent, L. A., et al., 2011. Reply, J. Climate, 24, 2884-2887.