eda_qq
Generates an empirical or Normal QQ plot as well
as a Tukey mean-difference plot.
Usage
eda_qq(
x,
y = NULL,
fac = NULL,
norm = FALSE,
p = 1L,
tukey = FALSE,
md = FALSE,
q.type = 5,
fx = NULL,
fy = NULL,
plot = TRUE,
show.par = TRUE,
grey = 0.6,
pch = 21,
p.col = "grey50",
p.fill = "grey80",
size = 0.8,
alpha = 0.8,
q = TRUE,
b.val = c(0.25, 0.75),
l.val = c(0.125, 0.875),
xlab = NULL,
ylab = NULL,
title = NULL,
t.size = 1.2,
...
)
Arguments
- x
Vector for first variable or a dataframe.
- y
Vector for second variable or column defining the continuous variable if
x
is a dataframe.- fac
Column defining the grouping variable if
x
is a dataframe.- norm
Boolean determining if a Normal QQ plot is to be generated.
- p
Power transformation to apply to both sets of values.
- tukey
Boolean determining if a Tukey transformation should be adopted (FALSE adopts a Box-Cox transformation).
- md
Boolean determining if Tukey mean-difference plot should be generated.
- q.type
An integer between 1 and 9 selecting one of the nine quantile algorithms. (See
quantile
tile function).- fx
Formula to apply to x variable. This is computed after any transformation is applied to the x variable.
- fy
Formula to apply to y variable. This is computed after any transformation is applied to the y variable.
- plot
Boolean determining if plot should be generated.
- show.par
Boolean determining if parameters such as power transformation or formula should be displayed.
- grey
Grey level to apply to plot elements (0 to 1 with 1 = black).
- pch
Point symbol type.
- p.col
Color for point symbol.
- p.fill
Point fill color passed to
bg
(Only used forpch
ranging from 21-25).- size
Point size (0-1)
- alpha
Point transparency (0 = transparent, 1 = opaque). Only applicable if
rgb()
is not used to define point colors.- q
Boolean determining if grey quantile boxes should be plotted.
- b.val
Quantiles to define the quantile box parameters. Defaults to the IQR. Two values are needed.
- l.val
Quantiles to define the quantile line parameters. Defaults to the mid 75% of values. Two values are needed.
- xlab
X label for output plot. Ignored if
x
is a dataframe.- ylab
Y label for output plot. Ignored if
x
is a dataframe.- title
Title to add to plot.
- t.size
Title size.
- ...
Not used
Value
Returns a list with the following components:
x
: X values. May be interpolated to smallest quantile batch. Values will reflect power transformation defined inp
.b
: Y values. May be interpolated to smallest quantile batch. Values will reflect power transformation defined inp
.p
: Re-expression applied to original values.fx
: Formula applied to x variable.fy
: Formula applied to y variable.
Details
When the function is used to generate an empirical QQ plot, the plot
will displays the IQR via grey boxes for both x and y values. The box
widths can be changed via the b.val
argument. The plot will also
display the mid 75% of values via light colored dashed lines. The line
positions can be changed via the l.val
argument. The middle dashed
line represents each batch's median value. Console output prints the
suggested multiplicative and additive offsets. See the QQ plot vignette for
an introduction on its use and interpretation.
The function can also be used to generate a Normal QQ plot when the
norm
argument is set to TRUE
. In such a case, the line
parameters l.val
are overridden and are set to +/- 1 standard
deviations. Note that the "suggested offsets" output is disabled, nor
can you generate an M-D version of the Normal QQ plot. Also note
that the formula argument is ignored in this mode.
Examples
# Passing data as a dataframe
singer <- lattice::singer
dat <- singer[singer$voice.part %in% c("Bass 2", "Tenor 1"), ]
eda_qq(dat, height, voice.part)
#> [1] "Suggested offsets:y = x * 0.8571 + (12.4286)"
# Passing data as two separate vector objects
bass2 <- subset(singer, voice.part == "Bass 2", select = height, drop = TRUE )
tenor1 <- subset(singer, voice.part == "Tenor 1", select = height, drop = TRUE )
eda_qq(bass2, tenor1)
#> [1] "Suggested offsets:y = x * 1.04 + (-5.2163)"
# There seems to be an additive offset of about 2 inches
eda_qq(bass2, tenor1, fx = "x - 2")
#> [1] "Suggested offsets:y = x * 1.04 + (-5.2163)"
# We can fine-tune by generating the Tukey mean-difference plot
eda_qq(bass2, tenor1, fx = "x - 2", md = TRUE)
#> [1] "Suggested offsets:y = x * 1.04 + (-5.2163)"
# An offset of another 0.5 inches seems warranted
# We can sat that overall, bass2 singers are 2.5 inches taller than tenor1.
# The offset is additive.
eda_qq(bass2, tenor1, fx = "x - 2.5", md = TRUE)
#> [1] "Suggested offsets:y = x * 1.04 + (-5.2163)"
# Example 2: Sepal width
setosa <- subset(iris, Species == "setosa", select = Petal.Width, drop = TRUE)
virginica <- subset(iris, Species == "virginica", select = Petal.Width, drop = TRUE)
eda_qq(setosa, virginica)
#> [1] "Suggested offsets:y = x * 1.7143 + (1.6286)"
# The points are not completely parallel to the 1:1 line suggesting a
# multiplicative offset. The slope may be difficult to eyeball. The function
# outputs a suggested slope and intercept. We can start with that
eda_qq(setosa, virginica, fx = "x * 1.7143")
#> [1] "Suggested offsets:y = x * 1.7143 + (1.6286)"
# Now let's add the suggested additive offset.
eda_qq(setosa, virginica, fx = "x * 1.7143 + 1.6286")
#> [1] "Suggested offsets:y = x * 1.7143 + (1.6286)"
# We can confirm this value via the mean-difference plot
# Overall, we have both a multiplicative and additive offset between the
# species' petal widths.
eda_qq(setosa, virginica, fx = "x * 1.7143 + 1.6286", md = TRUE)
#> [1] "Suggested offsets:y = x * 1.7143 + (1.6286)"
# Function can also generate a Normal QQ plot
eda_qq(bass2, norm = TRUE)