Skip to contents

eda_lm generates a scatter plot with a fitted regression line. A loess line can also be added to the plot for model comparison. The axes are scaled such that their respective standard deviations match axes unit length.

Usage

eda_lm(
  dat,
  x,
  y,
  xlab = NULL,
  ylab = NULL,
  px = 1,
  py = 1,
  tukey = FALSE,
  show.par = TRUE,
  reg = TRUE,
  w = NULL,
  sd = TRUE,
  mean.l = TRUE,
  grey = 0.6,
  pch = 21,
  p.col = "grey50",
  p.fill = "grey80",
  size = 0.8,
  alpha = 0.8,
  q = FALSE,
  q.val = c(0.16, 0.84),
  q.type = 5,
  loe = FALSE,
  lm.col = rgb(1, 0.5, 0.5, 0.8),
  loe.col = rgb(0.3, 0.3, 1, 1),
  stats = FALSE,
  stat.size = 0.8,
  loess.d = list(family = "symmetric", span = 0.7, degree = 1),
  ...
)

Arguments

dat

Data frame.

x

Column assigned to the x axis.

y

Column assigned to the y axis.

xlab

X label for output plot.

ylab

Y label for output plot.

px

Power transformation to apply to the x-variable.

py

Power transformation to apply to the y-variable.

tukey

Boolean determining if a Tukey transformation should be adopted (FALSE adopts a Box-Cox transformation).

show.par

Boolean determining if power transformation should be displayed in the plot.

reg

Boolean indicating whether a least squares regression line should be plotted.

w

Weight to pass to regression model.

sd

Boolean determining if standard deviation lines should be plotted.

mean.l

Boolean determining if the x and y mean lines should be added to the plot.

grey

Grey level to apply to plot elements (0 to 1 with 1 = black).

pch

Point symbol type.

p.col

Color for point symbol.

p.fill

Point fill color passed to bg (Only used for pch ranging from 21-25).

size

Point size (0-1).

alpha

Point transparency (0 = transparent, 1 = opaque). Only applicable if rgb() is not used to define point colors.

q

Boolean determining if grey quantile boxes should be plotted.

q.val

F-values to use to define the quantile box parameters. Defaults to mid 68 are used to generate the box.

q.type

Quantile type. Defaults to 5 (Cleveland's f-quantile definition).

loe

Boolean indicating if a loess curve should be fitted.

lm.col

Regression line color.

loe.col

LOESS curve color.

stats

Boolean indicating if regression summary statistics should be displayed.

stat.size

Text size of stats output in plot.

loess.d

A list of parameters passed to the loess.smooth function. A robust loess is used by default.

...

Not used.

Value

Returns residuals, intercept and slope from an OLS fit.

  • residuals: Regression model residuals

  • a: Intercept

  • b: Slope

Details

The function will plot an OLS regression line and, if requested, a loess fit. The plot will also display the +/- 1 standard deviations as dashed lines. In theory, if both x and y values follow a perfectly Normal distribution, roughly 68 percent of the points should fall in between these lines.

The true 68 percent of values can be displayed as grey rectangles by setting q=TRUE. This uses the quantile function to compute the upper and lower bounds defining the inner 68 percent of values. If the data follow a Normal distribution, the grey rectangle edges should coincide with the +/- 1SD dashed lines.
If you wish to show the interquartlie ranges (IQR) instead of the inner 68 percent of values, simply set q.val = c(0.25,0.75).

The plot has the option to re-express the values via the px and py arguments. But note that if the re-expression produces NaN values (such as if a negative value is logged) those points will be removed from the plot. This will result in fewer observations being plotted. If observations are removed as result of a re-expression a warning message will be displayed in the console.
The re-expression powers are shown in the upper right side of the plot. To suppress the display of the re-expressions set show.par = FALSE.

See also

plot and loess.smooth functions

Examples


# Add a regular (OLS) regression model and loess smooth to the data
eda_lm(mtcars, wt, mpg, loe = TRUE)

#> (Intercept)           x 
#>   37.285126   -5.344472 

# Add the inner 68% quantile to compare the true 68% of data to the SD
eda_lm(mtcars, wt, mpg, loe = TRUE, q = TRUE)

#> (Intercept)           x 
#>   37.285126   -5.344472 

# Show the IQR box
eda_lm(mtcars, wt, mpg, loe = TRUE, q = TRUE, sd = FALSE, q.val = c(0.25,0.75))

#> (Intercept)           x 
#>   37.285126   -5.344472 

# Fit an OLS to the Income for Female vs Male
df2 <- read.csv("https://mgimond.github.io/ES218/Data/Income_education.csv")
eda_lm(df2, x=B20004013, y = B20004007, xlab = "Female", ylab = "Male",
            loe = TRUE)

#>  (Intercept)            x 
#> 10503.090485     1.086416 

# Add the inner 68% quantile to compare the true 68% of data to the SD
eda_lm(df2, x = B20004013, y = B20004007, xlab = "Female", ylab = "Male",
            q = TRUE)

#>  (Intercept)            x 
#> 10503.090485     1.086416 

# Apply a transformation to x and y axes: x -> 1/3 and y -> log
eda_lm(df2, x = B20004013, y = B20004007, xlab = "Female", ylab = "Male",
            px = 1/3, py = 0, q = TRUE, loe = TRUE)

#> (Intercept)           x 
#>  8.58646713  0.02287702