eda_boxls
creates boxplots conditioned on one variable
while providing the option to spreads levels and/or levels.
Usage
eda_boxls(
dat,
x,
fac,
p = 1,
tukey = FALSE,
outlier = TRUE,
out.txt = NULL,
type = "none",
notch = FALSE,
horiz = FALSE,
outliers = TRUE,
xlab = NULL,
ylab = NULL,
grey = 0.6,
reorder = TRUE,
reorder.stat = "median"
)
Arguments
- dat
Data frame
- x
Column name assigned to the values
- fac
Column name assigned to the factor the values are to be conditioned on
- p
Power transformation to apply to variable
- tukey
Boolean determining if a Tukey transformation should be adopted (FALSE adopts a Box-Cox transformation)
- outlier
Boolean indicating if outliers should be plotted
- out.txt
Column whose values are to be used to label outliers
- type
Plot type. "none" = no equalization ; "l" = equalize by level; "ls" = equalize by both level and spread
- notch
Boolean determining if notches should be added.
- horiz
plot horizontally (TRUE) or vertically (FALSE)
- outliers
plot outliers (TRUE) or not (FALSE)
- xlab
X label for output plot
- ylab
Y label for output plot
- grey
Grey level to apply to plot elements (0 to 1 with 1 = black)
- reorder
Boolean determining if factors have to be reordered based on median, upper quartile or lower quartile (set in
reorder.type
).- reorder.stat
Statistic to reorder level by if
reorder
is set toTRUE
. Either"median"
,"upper"
(for upper quartile) or"lower"
(for lower quartile). Iftype
is set to a value other than"none"
, the this argument is ignored and the stat defaults to"median"
.
Details
By default, the boxplots are re-ordered by their median values.
If the outlier text to be displayed is its own value, it will not be modified if the data are equalized by level or spread.
Note that the notch offers a 95 percent test of the null that the true medians are equal assuming that the distribution of each batch is approximately normal. If the notches do not overlap, we can assume that medians are significantly different at a 0.05 level. Note that the notches do not correct for multiple comparison issues when three or more batches are plotted.
Examples
# A basic boxplot. The outlier is labeled with the row number by default.
eda_boxls(mtcars,mpg, cyl, type="none")
# A basic boxplot. The outlier is labeled with its own value.
eda_boxls(mtcars,mpg, cyl, type="none", out.txt=mpg )
# Boxplot equalized by level. Note that the outlier text is labeled with its
# original value.
eda_boxls(mtcars,mpg, cyl, type="l", out.txt=mpg )
#> ========================
#> Note that the data have been equalized with "type" set to "l".
#> ========================
# Boxplots equalized by level and spread
eda_boxls(mtcars,mpg, cyl, type="ls", out.txt=mpg )
#> ========================
#> Note that the data have been equalized with "type" set to "ls".
#> ========================
# Hide outlier
eda_boxls(mtcars,mpg, cyl, type="ls", out.txt=mpg , outlier=FALSE)
#> ========================
#> Note that the data have been equalized with "type" set to "ls".
#> ========================
# Equalizing level helps visualize increasing spread with increasing
# median value
food <- read.csv("http://mgimond.github.io/ES218/Data/Food_web.csv")
eda_boxls(food, mean.length, dimension, type = "l")
#> ========================
#> Note that the data have been equalized with "type" set to "l".
#> ========================
# For long factor level names, flip plot
eda_boxls(iris, Sepal.Length, Species, out.txt=Sepal.Length , horiz = TRUE)
# By default, plots are ordered by their medians.
singer <- lattice::singer
eda_boxls(singer, height, voice.part, out.txt=height, horiz = TRUE)
# To order by top quartile, set reorder.stat to "upper"
eda_boxls(singer, height, voice.part, out.txt=height, horiz = TRUE,
reorder.stat = "upper")