Skip to contents

eda_qqmat Generates a matrix of empirical QQ plots

Usage

eda_qqmat(
  dat,
  x,
  fac,
  p = 1L,
  tukey = FALSE,
  q.type = 5,
  diag = TRUE,
  xylim = NULL,
  resid = FALSE,
  stat = mean,
  plot = TRUE,
  grey = 0.6,
  pch = 21,
  p.col = "grey40",
  p.fill = "grey60",
  size = 1,
  text.size = 1,
  tail.pch = 21,
  tail.p.col = "grey70",
  tail.p.fill = NULL,
  tic.size = 0.7,
  alpha = 0.8,
  q = FALSE,
  tails = TRUE,
  med = TRUE,
  inner = 0.75,
  ...
)

Arguments

dat

Data frame.

x

Continuous variable.

fac

Categorical variable.

p

Power transformation to apply to the continuous variable.

tukey

Boolean determining if a Tukey transformation should be adopted (FALSE adopts a Box-Cox transformation).

q.type

An integer between 1 and 9 selecting one of the nine quantile algorithms. (See quantiletile function).

diag

Boolean determining if both upper and lower triangular matrix should be plotted. If set to FALSE, only the lower triangular matrix is plotted.

xylim

X and Y axes limits.

resid

Boolean determining if residuals should be plotted. Residuals are computed using the stat parameter.

stat

Statistic to use if residuals are to be computed. Currently mean (default) or median.

plot

Boolean determining if plot should be generated.

grey

Grey level to apply to plot elements (0 to 1 with 1 = black).

pch

Point symbol type.

p.col

Color for point symbol.

p.fill

Point fill color passed to bg (Only used for pch ranging from 21-25).

size

Point symbol size (0-1).

text.size

Size for category text in diagonal box.

tail.pch

Tail-end point symbol type (See tails).

tail.p.col

Tail-end color for point symbol (See tails).

tail.p.fill

Tail-end point fill color passed to bg (Only used for tail.pch ranging from 21-25).

tic.size

Size of tic labels (defaults to 0.8).

alpha

Point transparency (0 = transparent, 1 = opaque). Only applicable if rgb() is not used to define point colors.

q

Boolean determining if grey box highlighting the inner region should be displayed.

tails

Boolean determining if points outside of the inner region should be symbolized differently. Tail-end points are symbolized via the tail.pch, tail.p.col and tail.p.fill arguments.

med

Boolean determining if median lines should be drawn.

inner

Fraction of mid-values to highlight in q or tails. Defaults to the inner 75% of values.

...

Not used

Value

Returns a list with the following components:

  • data: List with input x and y values for each group. May be interpolated to smallest quantile batch if batch sizes don't match. Values will reflect power transformation defined in p.

  • p: Transformation applied to original values.

Details

The function will generate an empirical QQ plot matrix from a dataframe of continuous values and matching categories. The function is designed to place emphasis on the mid portion of the data. The mid portion range is defined by inner (the inner fraction of the data). By default, the points outside of the mid portion of the data are symbolized differently. You can also highlight the mid region in light grey by setting q = TRUE. The median of both batches are shown in vertical and horizontal dashed lines. For a plain vanilla QQ plot matrix you can remove all guides by setting tails = FALSE and mid = FALSE.

The QQ plot matrix is most effective in comparing residuals after the data are fitted by the mean or median. To plot the residuals, set resid=TRUE. By default, the mean is used. You can change the statistic to the median by setting stat=median.

The function also allows for batch transformation of values via the p argument. The transformation is applied to the data prior to computing the residuals.

References

  • John M. Chambers, William S. Cleveland, Beat Kleiner, Paul A. Tukey. Graphical Methods for Data Analysis (1983)

Examples


# Default output
singer <- lattice::singer
eda_qqmat(singer, height, voice.part)


# Limit to lower triangular matrix
eda_qqmat(singer, height, voice.part, diag = FALSE)


# Plot residuals after fitting mean model
eda_qqmat(singer, height, voice.part, resid = TRUE)


# Generate plain vanilla QQ plot matrix
eda_qqmat(mtcars, mpg, cyl,resid = TRUE, tails = FALSE, med = FALSE)


# Log transform the data, then plot the residuals after fitting the mean model
eda_qqmat(iris, Petal.Length, Species, resid = TRUE, p = 0)
#> Note that a power transformation of 0 was applied to the data before they were processed for the plot.


# Fit the median model instead of the mean
eda_qqmat(iris, Petal.Length, Species, resid = TRUE, p = 0, stat = median)
#> Note that a power transformation of 0 was applied to the data before they were processed for the plot.


# Fill inner region with grey boxes
eda_qqmat(iris, Petal.Length, Species, resid = TRUE, q = TRUE, p = 0)
#> Note that a power transformation of 0 was applied to the data before they were processed for the plot.


# Change tail point symbol
eda_qqmat(iris, Petal.Length, Species, resid = TRUE, p = 0, tail.pch = 3)
#> Note that a power transformation of 0 was applied to the data before they were processed for the plot.


# Change inner region point symbols to dark orange and reduce size of all
# point symbols
eda_qqmat(iris, Petal.Length, Species, resid = TRUE, p = 0, size = 0.8,
          tail.pch = 3, p.fill = "darkorange")
#> Note that a power transformation of 0 was applied to the data before they were processed for the plot.