Exploratory Data Analysis in R


Manny Gimond


February 18, 2024


This book is a compilation of lecture notes used in an Exploratory Data Analysis in R course taught to undergraduates at Colby College. The course assumes little to no background in quantitative analysis nor in computer programming and was first taught in Spring, 2015. The course introduces students to data manipulation in R, data exploration (in the spirit of John Tukey’s EDA) and the R markdown language. Many of the visualization techniques are adopted from William Cleveland’s Data Visualization book.

The base R plotting environment and the ggplot2 ecosystem are used throughout this book. While a chapter is dedicated to the lattice plotting package, its functions are not used outside of that chapter given that ggplot2 offers many of lattice’s functionality.

While great effort is made to adopt a consistent plotting environment throughout this book (this being ggplot2, for the most part), a few topics (including the q-q plot and the median polish) will benefit from custom plotting functions available in the tukeyedar package. The package can be downloaded from GitHub via the command:


Note that installing the GitHub package will require that the devtools package be installed first.

Functions making use of the tukeyedar package will be highlighted in a peach/pink code block as opposed to the default light yellow code block used for all other code blocks. For example, if tukeyedar’s eda_qq function is used, the code block will take on the following appearance:

eda_qq(Tenor, Bass)

The tukeyedar functions are built off of base R graphics and require R vesion 4.1 or greater.

Manuel “Manny” Gimond