Skip to contents

eda_lm generates a scatter plot with a fitted regression line. A loess line can also be added to the plot for model comparison. The axes are scaled such that their respective standard deviations match axes unit length.

Usage

eda_lm(
  dat,
  x,
  y,
  xlab = NULL,
  ylab = NULL,
  px = 1,
  py = 1,
  tukey = FALSE,
  show.par = TRUE,
  reg = TRUE,
  poly = 1,
  robust = FALSE,
  w = NULL,
  sd = TRUE,
  mean.l = TRUE,
  asp = TRUE,
  grey = 0.6,
  pch = 21,
  p.col = "grey50",
  p.fill = "grey80",
  size = 0.8,
  alpha = 0.8,
  q = FALSE,
  q.val = c(0.16, 0.84),
  q.type = 5,
  loe = FALSE,
  lm.col = rgb(1, 0.5, 0.5, 0.8),
  loe.col = rgb(0.3, 0.3, 1, 1),
  stats = FALSE,
  stat.size = 0.8,
  loess.d = list(family = "symmetric", span = 0.7, degree = 1),
  rlm.d = list(psi = "psi.bisquare"),
  ...
)

Arguments

dat

Dataframe.

x

Column assigned to the x axis.

y

Column assigned to the y axis.

xlab

X label for output plot.

ylab

Y label for output plot.

px

Power transformation to apply to the x-variable.

py

Power transformation to apply to the y-variable.

tukey

Boolean determining if a Tukey transformation should be adopted (FALSE adopts a Box-Cox transformation).

show.par

Boolean determining if power transformation should be displayed in the plot.

reg

Boolean indicating whether a least squares regression line should be plotted.

poly

Polynomial order.

robust

Boolean indicating if robust regression should be used.

w

Weight to pass to regression model.

sd

Boolean determining if standard deviation lines should be plotted.

mean.l

Boolean determining if the x and y mean lines should be added to the plot.

asp

Boolean determining if the plot aspect ratio should equal the ratio of the x and y standard deviations. A value of FALSE defaults to the base plot's default aspect ratio. A value of TRUE uses the aspect ratio sd(x)/sd(y).

grey

Grey level to apply to plot elements (0 to 1 with 1 = black).

pch

Point symbol type.

p.col

Color for point symbol.

p.fill

Point fill color passed to bg (Only used for pch ranging from 21-25).

size

Point size (0-1).

alpha

Point transparency (0 = transparent, 1 = opaque). Only applicable if rgb() is not used to define point colors.

q

Boolean determining if shaded region showing the mid-portion of the data should be added to the plot.

q.val

Upper and lower bounds of the shaded region shown by q. Defaults to the mid 68 fractions ranging from 0 to 1 and are passed to the argument via a c() function. Defaults to c(0.16,0.84).

q.type

Quantile type. Defaults to 5 (Cleveland's f-quantile definition).

loe

Boolean indicating if a loess curve should be fitted.

lm.col

Regression line color.

loe.col

LOESS curve color.

stats

Boolean indicating if regression summary statistics should be displayed.

stat.size

Text size of stats output in plot.

loess.d

A list of arguments passed to the loess.smooth function. A robust loess is used by default.

rlm.d

A list of arguments passed to the MASS::rlm function.

...

Not used.

Value

Returns a list of class eda_lm. Output includes the following if reg = TRUE. Returns NULL otherwise.

  • residuals: Regression model residuals

  • a: Intercept

  • b: Polynomial coefficient(s)

  • fitted.values: Fitted values

Details

The function will plot a regression line and, if requested, a loess fit. The function adopts the least squares fitting technique by default. It defaults to a first order polynomial fit. The polynomial order can be specified via the poly argument.

The plot displays the +/- 1 standard deviations as dashed lines. In theory, if both x and y values follow a perfectly Normal distribution, roughly 68 percent of the points should fall in between these lines.

The true 68 percent of values can be displayed as a shaded region by setting q=TRUE. It uses the quantile function to compute the upper and lower bounds defining the inner 68 percent of values. If the data follow a Normal distribution, the grey rectangle edges should coincide with the +/- 1SD dashed lines. If you wish to show the interquartile ranges (IQR) instead of the inner 68 percent of values, simply set q.val = c(0.25,0.75).

The plot has the option to re-express the values via the px and py arguments. But note that if the re-expression produces NaN values (such as if a negative value is logged) those points will be removed from the plot. This will result in fewer observations being plotted. If observations are removed as result of a re-expression a warning message will be displayed in the console. The re-expression powers are shown in the upper right side of the plot. To suppress the display of the re-expressions set show.par = FALSE.

If the robust argument is set to TRUE, MASS's built-in robust fitting model, rlm, is used to fit the regression line to the data. rlm arguments can be passed as a list via the rlm.d argument.

See also

plot and loess.smooth functions

Examples


# Add a regular (OLS) regression model and loess smooth to the data
eda_lm(mtcars, wt, mpg, loe = TRUE)

#>       int      wt^1 
#> 37.285126 -5.344472 

# Add the inner 68% quantile to compare the true 68% of data to the SD
eda_lm(mtcars, wt, mpg, loe = TRUE, q = TRUE)

#>       int      wt^1 
#> 37.285126 -5.344472 

# Show the IQR box
eda_lm(mtcars, wt, mpg, loe = TRUE, q = TRUE, sd = FALSE, q.val = c(0.25,0.75))

#>       int      wt^1 
#> 37.285126 -5.344472 

# Fit an OLS to income for Female vs Male
df2 <- read.csv("https://mgimond.github.io/ES218/Data/Income_education.csv")
eda_lm(df2, x=B20004013, y = B20004007, xlab = "Female", ylab = "Male",
            loe = TRUE)

#>          int     Female^1 
#> 10503.090485     1.086416 

# Add the inner 68% quantile to compare the true 68% of data to the SD
eda_lm(df2, x = B20004013, y = B20004007, xlab = "Female", ylab = "Male",
            q = TRUE)

#>          int     Female^1 
#> 10503.090485     1.086416 

# Apply a transformation to x and y axes: x -> 1/3 and y -> log
eda_lm(df2, x = B20004013, y = B20004007, xlab = "Female", ylab = "Male",
            px = 1/3, py = 0, q = TRUE, loe = TRUE)

#>        int   Female^1 
#> 8.58646713 0.02287702 

# Fit a second order polynomial
eda_lm(mtcars, hp, mpg, poly = 2)

#>           int          hp^1          hp^2 
#> 40.4091172029 -0.2133082599  0.0004208156 

# Fit a robust regression model
eda_lm(mtcars, hp, mpg, robust = TRUE, poly = 2)

#>           int          hp^1          hp^2 
#> 39.3003734539 -0.2062942523  0.0004113048