Bivariate EDA Preamble

What You Will Learn in This Section

This section builds upon the univariate foundations of Exploratory Data Analysis (EDA) by introducing techniques for analyzing relationships between two continuous variables. While univariate analysis focuses on understanding the distribution of a single variable, bivariate analysis seeks to uncover patterns, associations, and dependencies between variables. The goal is not to confirm hypotheses, but to develop a visual and conceptual toolkit for modeling, diagnosing, and refining relationships in two-dimensional data.


Chapter 26: Fitting and Exploring Bivariate Models

This chapter introduces the foundational tools for exploring relationships between two variables. You’ll learn how to use scatter plots to visualize associations and how to fit polynomial models (e.g., linear, quadratic) to capture trends. The chapter also introduces the concept of residuals as a way to assess model fit.

Chapter 27: Non-parametric Bivariate Modeling with Loess

When the form of the relationship between variables is unknown or nonlinear, loess smoothing offers a flexible, data-driven alternative to polynomial models. This chapter explains how loess works, how to tune its parameters (span and degree), and when to use it.

Chapter 28: Model Residuals

Residuals are central to understanding how well a model captures the structure in the data. This chapter introduces residual-dependence and residual-fit plots, which help diagnose model inadequacies such as curvature or nonlinearity.

Chapter 29: Exploring Spread in the Residuals

This chapter focuses on heteroscedasticity—situations where the spread of residuals changes across the range of the independent variable. You’ll learn to use spread-location and spread-dependence plots to detect and address unequal spread.

Chapter 30: Visualizing Variability Decomposition in Bivariate Models

Here, we introduce the variability decomposition (VD) plot, which visually separates the variability explained by the model from the residual variability. This helps assess the predictive power of a model.

Chapter 31: Bivariate Residual-Fit Spread Plot

The residual-fit spread (RFS) plot offers a quantile-based comparison of fitted values and residuals. It complements the VD plot by providing a more detailed view of model performance.

Chapter 32: Parameterizing the Residuals

This chapter explores how to characterize residuals using statistical distributions, particularly the Normal distribution. You’ll learn how to use Q-Q plots to assess normality and what to do when residuals deviate from Normality.

Chapter 33: Refining Bivariate Models Through Re-expression

When model assumptions are violated, re-expressing one or both variables can improve model fit and stabilize residual spread. This chapter walks through a real-world example of iterative model refinement using power transformations.

Chapter 34: Robust Regression: Resistant Lines and Beyond

Outliers can distort traditional regression models. This chapter introduces robust regression techniques, including Tukey’s resistant line and bisquare regression, which reduce the influence of extreme values.

Chapter 35: Slicing Data: Exploring Discontinuities and Local Models

Sometimes, a single global model is insufficient. This chapter introduces data slicing and local modeling to uncover structural breaks or regime shifts in the data.

Chapter 36: A Deeper Exploration of Residuals: Revealing Hidden Structure

Residuals can reveal more than just model misfit—they can uncover layered patterns such as trends, seasonality, and anomalies. This chapter demonstrates how to iteratively model and analyze residuals to uncover hidden structure in real-world data.


The Big Picture

Together, the chapters in this section form a coherent arc for understanding and modeling relationships between two continuous variables:

  • Visualize the relationship between variables using scatter plots and fitted curves.
  • Model the association using parametric (e.g., polynomial) and non-parametric (e.g., loess) techniques.
  • Diagnose model fit using residual plots to uncover curvature, nonlinearity, and heteroscedasticity.
  • Quantify explained versus unexplained variation using variability decomposition and residual-fit spread plots.
  • Re-express variables to stabilize spread, improve fit, and meet model assumptions.
  • Refine models using robust regression techniques to mitigate the influence of outliers.
  • Explore local structure and discontinuities by slicing data and fitting segmented models.
  • Iterate through layers of residual analysis to reveal hidden structure and deepen understanding.

This sequence equips you with a flexible and visual toolkit for uncovering patterns, refining models, and interpreting complex relationships in bivariate data.