base
vs tidyverse
vs data.table
data.table
ecosystemThis write-up highlights differences between the base
, tidyverse
and data.table
ecosystems as it pertains to data manipulation.
The tidyverse
is itself a package that includes eight core packages however, for the purpose of this tutorial, emphasis will be placed on two of its core packages, dplyr
and tidyr
, which are used for most data manipulation operations. The tidyverse
has its own website, https://www.tidyverse.org/, and it is mostly maintained by RStudio staff.
The data.table
has its own website too, https://rdatatable.gitlab.io/data.table/. It was originally developed by Matt Dowle in 2008 and is now maintained by him and a large community of contributors. data.table
is also available for Python as (pydatatable)[https://github.com/h2oai/datatable]
Before continuing, make sure to attach the following packages.
# Load the two core tidyverse packages
library(dplyr)
library(tidyr)
# Load the data.table package
library(data.table)
Environment | base | tidyverse | data.table |
---|---|---|---|
Pros |
|
|
|
Cons |
|
dtplyr , can match data.table speed
|
|
Environment | base | tidyverse | data.table |
---|---|---|---|
Supported data class(es) |
data.frame
|
data.frame , tibble
|
data.table
|
Reading data |
read.csv
|
read_csv
|
fread
|
Subset by column |
[ , ...]
|
select()
|
[ ,... , ]
|
Subset by rows |
[... , ]
|
filter()
|
[... , , ]
|
Create new column |
df$y = ...
|
mutate(tb, y = ...)
|
[ , y := ..., ]
|
Delete a column |
df$y = NULL
|
select(tb, -y)
|
[ , y := NULL, ]
|
Summarize |
apply(df[ , y], 2, ...)
|
summarise()
|
[ , ...(y), ]
|
Grouping |
aggregate()
|
group_by()
|
[ , , by = ...]
|
Pivot to long |
reshape()
|
pivot_longer()
|
melt()
|
Pivot to wide |
reshape()
|
pivot_wider()
|
dcast()
|
Joining tables |
merge()
|
left_join()
|
DT1[ DT2, on = ...]
|
Both base
functions and tidyverse
functions will accept a data.frame
as input. tidyverse
functions will also accept the tidyverse
’s native tibble
format. In the following exercises, a tibble
will be passed to all tidyverse
functions if only to distinguish a tidyverse operation from a base
or data.table
operation.
A data.table
, however, requires that the input table be in a native data.table
format. But a data.table
can be used as input into both base
and tidyverse
functions. In the examples that follow, data.table
tables will be limited to data.table
operations to distinguish such operations from those in base
and tidyverse.
mpg | cyl | hp | am | car |
---|---|---|---|---|
21.0 | 6 | 110 | 1 | Mazda RX4 |
21.0 | 6 | 110 | 1 | Mazda RX4 Wag |
22.8 | 4 | 93 | 1 | Datsun 710 |
21.4 | 6 | 110 | 0 | Hornet 4 Drive |
18.7 | 8 | 175 | 0 | Hornet Sportabout |
18.1 | 6 | 105 | 0 | Valiant |
14.3 | 8 | 245 | 0 | Duster 360 |
24.4 | 4 | 62 | 0 | Merc 240D |
22.8 | 4 | 95 | 0 | Merc 230 |
19.2 | 6 | 123 | 0 | Merc 280 |
17.8 | 6 | 123 | 0 | Merc 280C |
16.4 | 8 | 180 | 0 | Merc 450SE |
17.3 | 8 | 180 | 0 | Merc 450SL |
15.2 | 8 | 180 | 0 | Merc 450SLC |
10.4 | 8 | 205 | 0 | Cadillac Fleetwood |
10.4 | 8 | 215 | 0 | Lincoln Continental |
14.7 | 8 | 230 | 0 | Chrysler Imperial |
32.4 | 4 | 66 | 1 | Fiat 128 |
30.4 | 4 | 52 | 1 | Honda Civic |
33.9 | 4 | 65 | 1 | Toyota Corolla |
21.5 | 4 | 97 | 0 | Toyota Corona |
15.5 | 8 | 150 | 0 | Dodge Challenger |
15.2 | 8 | 150 | 0 | AMC Javelin |
13.3 | 8 | 245 | 0 | Camaro Z28 |
19.2 | 8 | 175 | 0 | Pontiac Firebird |
27.3 | 4 | 66 | 1 | Fiat X1-9 |
26.0 | 4 | 91 | 1 | Porsche 914-2 |
30.4 | 4 | 113 | 1 | Lotus Europa |
15.8 | 8 | 264 | 1 | Ford Pantera L |
19.7 | 6 | 175 | 1 | Ferrari Dino |
15.0 | 8 | 335 | 1 | Maserati Bora |
21.4 | 4 | 109 | 1 | Volvo 142E |
data.table
ecosystemThe data.table
environment benefits from its concise syntax; this eliminates the need to memorize an arsenal of function names. In fact, most data manipulating operations are performed between brackets []
. However unlike the base
and tidyverse
environments, the data must be in a data.table
format. This can be accomplished by loading the the data file using the data.table
’s fread()
function, or by converting the data frame to a data.table
via the as.data.table()
function.
The bracket structure differs from that of the base dataframe bracket in that the data.table
bracket takes three indices as opposed to two.
The indices are referenced as follows:
i
: which rows and what (if anything) to do with these rows.j
: which columns and what (if anything) to do with these columns.by
: how to group records if j
is being summarized.From there, you can chain other operations.
Of interest to some, the j
index can take on many base R functions including plot()
.
For more information on the bracket environments, type:
?`[.data.table`
: To access the help page for data.table
’s []
environment.
?`[.data.frame`
: To access the help page for base
’s []
environment.
base R operations
tidyverse operations
data.table operations
df[ , c(“mpg”, “hp”)]
select(tb, mpg, hp)
dt[ , .(mpg, hp)]
mpg | hp |
---|---|
21.0 | 110 |
21.0 | 110 |
22.8 | 93 |
21.4 | 110 |
18.7 | 175 |
18.1 | 105 |
14.3 | 245 |
24.4 | 62 |
22.8 | 95 |
19.2 | 123 |
17.8 | 123 |
16.4 | 180 |
17.3 | 180 |
15.2 | 180 |
10.4 | 205 |
10.4 | 215 |
14.7 | 230 |
32.4 | 66 |
30.4 | 52 |
33.9 | 65 |
21.5 | 97 |
15.5 | 150 |
15.2 | 150 |
13.3 | 245 |
19.2 | 175 |
27.3 | 66 |
26.0 | 91 |
30.4 | 113 |
15.8 | 264 |
19.7 | 175 |
15.0 | 335 |
21.4 | 109 |
NOTE: in base R, if you extract a single column, the output will be a vector. To force the output to a dataframe, add drop = FALSE
as in df[ , "hp", drop = FALSE]
df[1:3 , ]
slice(tb, 1:3)
dt[1:3, , ]
mpg | cyl | hp | am | car |
---|---|---|---|---|
21.0 | 6 | 110 | 1 | Mazda RX4 |
21.0 | 6 | 110 | 1 | Mazda RX4 Wag |
22.8 | 4 | 93 | 1 | Datsun 710 |
df[ df$mpg > 20, ]
filter(tb, mpg > 20 )
dt[ mpg > 20 , ]
mpg | cyl | hp | am | car | |
---|---|---|---|---|---|
1 | 21.0 | 6 | 110 | 1 | Mazda RX4 |
2 | 21.0 | 6 | 110 | 1 | Mazda RX4 Wag |
3 | 22.8 | 4 | 93 | 1 | Datsun 710 |
4 | 21.4 | 6 | 110 | 0 | Hornet 4 Drive |
8 | 24.4 | 4 | 62 | 0 | Merc 240D |
9 | 22.8 | 4 | 95 | 0 | Merc 230 |
18 | 32.4 | 4 | 66 | 1 | Fiat 128 |
19 | 30.4 | 4 | 52 | 1 | Honda Civic |
20 | 33.9 | 4 | 65 | 1 | Toyota Corolla |
21 | 21.5 | 4 | 97 | 0 | Toyota Corona |
26 | 27.3 | 4 | 66 | 1 | Fiat X1-9 |
27 | 26.0 | 4 | 91 | 1 | Porsche 914-2 |
28 | 30.4 | 4 | 113 | 1 | Lotus Europa |
32 | 21.4 | 4 | 109 | 1 | Volvo 142E |
df[order(df$mpg) , ]
arrange(tb, mpg)
dt[order(mpg) , ]
mpg | cyl | hp | am | car | |
---|---|---|---|---|---|
15 | 10.4 | 8 | 205 | 0 | Cadillac Fleetwood |
16 | 10.4 | 8 | 215 | 0 | Lincoln Continental |
24 | 13.3 | 8 | 245 | 0 | Camaro Z28 |
7 | 14.3 | 8 | 245 | 0 | Duster 360 |
17 | 14.7 | 8 | 230 | 0 | Chrysler Imperial |
31 | 15.0 | 8 | 335 | 1 | Maserati Bora |
14 | 15.2 | 8 | 180 | 0 | Merc 450SLC |
23 | 15.2 | 8 | 150 | 0 | AMC Javelin |
22 | 15.5 | 8 | 150 | 0 | Dodge Challenger |
29 | 15.8 | 8 | 264 | 1 | Ford Pantera L |
12 | 16.4 | 8 | 180 | 0 | Merc 450SE |
13 | 17.3 | 8 | 180 | 0 | Merc 450SL |
11 | 17.8 | 6 | 123 | 0 | Merc 280C |
6 | 18.1 | 6 | 105 | 0 | Valiant |
5 | 18.7 | 8 | 175 | 0 | Hornet Sportabout |
10 | 19.2 | 6 | 123 | 0 | Merc 280 |
25 | 19.2 | 8 | 175 | 0 | Pontiac Firebird |
30 | 19.7 | 6 | 175 | 1 | Ferrari Dino |
1 | 21.0 | 6 | 110 | 1 | Mazda RX4 |
2 | 21.0 | 6 | 110 | 1 | Mazda RX4 Wag |
4 | 21.4 | 6 | 110 | 0 | Hornet 4 Drive |
32 | 21.4 | 4 | 109 | 1 | Volvo 142E |
21 | 21.5 | 4 | 97 | 0 | Toyota Corona |
3 | 22.8 | 4 | 93 | 1 | Datsun 710 |
9 | 22.8 | 4 | 95 | 0 | Merc 230 |
8 | 24.4 | 4 | 62 | 0 | Merc 240D |
27 | 26.0 | 4 | 91 | 1 | Porsche 914-2 |
26 | 27.3 | 4 | 66 | 1 | Fiat X1-9 |
19 | 30.4 | 4 | 52 | 1 | Honda Civic |
28 | 30.4 | 4 | 113 | 1 | Lotus Europa |
18 | 32.4 | 4 | 66 | 1 | Fiat 128 |
20 | 33.9 | 4 | 65 | 1 | Toyota Corolla |
df[order(-df$mpg) , ]
arrange(tb, -mpg)
dt[order(-mpg) , ]
mpg | cyl | hp | am | car | |
---|---|---|---|---|---|
20 | 33.9 | 4 | 65 | 1 | Toyota Corolla |
18 | 32.4 | 4 | 66 | 1 | Fiat 128 |
19 | 30.4 | 4 | 52 | 1 | Honda Civic |
28 | 30.4 | 4 | 113 | 1 | Lotus Europa |
26 | 27.3 | 4 | 66 | 1 | Fiat X1-9 |
27 | 26.0 | 4 | 91 | 1 | Porsche 914-2 |
8 | 24.4 | 4 | 62 | 0 | Merc 240D |
3 | 22.8 | 4 | 93 | 1 | Datsun 710 |
9 | 22.8 | 4 | 95 | 0 | Merc 230 |
21 | 21.5 | 4 | 97 | 0 | Toyota Corona |
4 | 21.4 | 6 | 110 | 0 | Hornet 4 Drive |
32 | 21.4 | 4 | 109 | 1 | Volvo 142E |
1 | 21.0 | 6 | 110 | 1 | Mazda RX4 |
2 | 21.0 | 6 | 110 | 1 | Mazda RX4 Wag |
30 | 19.7 | 6 | 175 | 1 | Ferrari Dino |
10 | 19.2 | 6 | 123 | 0 | Merc 280 |
25 | 19.2 | 8 | 175 | 0 | Pontiac Firebird |
5 | 18.7 | 8 | 175 | 0 | Hornet Sportabout |
6 | 18.1 | 6 | 105 | 0 | Valiant |
11 | 17.8 | 6 | 123 | 0 | Merc 280C |
13 | 17.3 | 8 | 180 | 0 | Merc 450SL |
12 | 16.4 | 8 | 180 | 0 | Merc 450SE |
29 | 15.8 | 8 | 264 | 1 | Ford Pantera L |
22 | 15.5 | 8 | 150 | 0 | Dodge Challenger |
14 | 15.2 | 8 | 180 | 0 | Merc 450SLC |
23 | 15.2 | 8 | 150 | 0 | AMC Javelin |
31 | 15.0 | 8 | 335 | 1 | Maserati Bora |
17 | 14.7 | 8 | 230 | 0 | Chrysler Imperial |
7 | 14.3 | 8 | 245 | 0 | Duster 360 |
24 | 13.3 | 8 | 245 | 0 | Camaro Z28 |
15 | 10.4 | 8 | 205 | 0 | Cadillac Fleetwood |
16 | 10.4 | 8 | 215 | 0 | Lincoln Continental |
df[order(df$cyl, df$mpg) , ]
arrange(tb, cyl, mpg)
dt[order(cyl, mpg) , ]
mpg | cyl | hp | am | car | |
---|---|---|---|---|---|
32 | 21.4 | 4 | 109 | 1 | Volvo 142E |
21 | 21.5 | 4 | 97 | 0 | Toyota Corona |
3 | 22.8 | 4 | 93 | 1 | Datsun 710 |
9 | 22.8 | 4 | 95 | 0 | Merc 230 |
8 | 24.4 | 4 | 62 | 0 | Merc 240D |
27 | 26.0 | 4 | 91 | 1 | Porsche 914-2 |
26 | 27.3 | 4 | 66 | 1 | Fiat X1-9 |
19 | 30.4 | 4 | 52 | 1 | Honda Civic |
28 | 30.4 | 4 | 113 | 1 | Lotus Europa |
18 | 32.4 | 4 | 66 | 1 | Fiat 128 |
20 | 33.9 | 4 | 65 | 1 | Toyota Corolla |
11 | 17.8 | 6 | 123 | 0 | Merc 280C |
6 | 18.1 | 6 | 105 | 0 | Valiant |
10 | 19.2 | 6 | 123 | 0 | Merc 280 |
30 | 19.7 | 6 | 175 | 1 | Ferrari Dino |
1 | 21.0 | 6 | 110 | 1 | Mazda RX4 |
2 | 21.0 | 6 | 110 | 1 | Mazda RX4 Wag |
4 | 21.4 | 6 | 110 | 0 | Hornet 4 Drive |
15 | 10.4 | 8 | 205 | 0 | Cadillac Fleetwood |
16 | 10.4 | 8 | 215 | 0 | Lincoln Continental |
24 | 13.3 | 8 | 245 | 0 | Camaro Z28 |
7 | 14.3 | 8 | 245 | 0 | Duster 360 |
17 | 14.7 | 8 | 230 | 0 | Chrysler Imperial |
31 | 15.0 | 8 | 335 | 1 | Maserati Bora |
14 | 15.2 | 8 | 180 | 0 | Merc 450SLC |
23 | 15.2 | 8 | 150 | 0 | AMC Javelin |
22 | 15.5 | 8 | 150 | 0 | Dodge Challenger |
29 | 15.8 | 8 | 264 | 1 | Ford Pantera L |
12 | 16.4 | 8 | 180 | 0 | Merc 450SE |
13 | 17.3 | 8 | 180 | 0 | Merc 450SL |
5 | 18.7 | 8 | 175 | 0 | Hornet Sportabout |
25 | 19.2 | 8 | 175 | 0 | Pontiac Firebird |
df$gpm <- 1 / df$mpg
tb <- mutate(tb, gpm = 1 / mpg)
dt[ , gpm := 1 / mpg, ]
mpg | cyl | hp | am | car | gpm |
---|---|---|---|---|---|
21.0 | 6 | 110 | 1 | Mazda RX4 | 0.0476190 |
21.0 | 6 | 110 | 1 | Mazda RX4 Wag | 0.0476190 |
22.8 | 4 | 93 | 1 | Datsun 710 | 0.0438596 |
21.4 | 6 | 110 | 0 | Hornet 4 Drive | 0.0467290 |
18.7 | 8 | 175 | 0 | Hornet Sportabout | 0.0534759 |
18.1 | 6 | 105 | 0 | Valiant | 0.0552486 |
14.3 | 8 | 245 | 0 | Duster 360 | 0.0699301 |
24.4 | 4 | 62 | 0 | Merc 240D | 0.0409836 |
22.8 | 4 | 95 | 0 | Merc 230 | 0.0438596 |
19.2 | 6 | 123 | 0 | Merc 280 | 0.0520833 |
17.8 | 6 | 123 | 0 | Merc 280C | 0.0561798 |
16.4 | 8 | 180 | 0 | Merc 450SE | 0.0609756 |
17.3 | 8 | 180 | 0 | Merc 450SL | 0.0578035 |
15.2 | 8 | 180 | 0 | Merc 450SLC | 0.0657895 |
10.4 | 8 | 205 | 0 | Cadillac Fleetwood | 0.0961538 |
10.4 | 8 | 215 | 0 | Lincoln Continental | 0.0961538 |
14.7 | 8 | 230 | 0 | Chrysler Imperial | 0.0680272 |
32.4 | 4 | 66 | 1 | Fiat 128 | 0.0308642 |
30.4 | 4 | 52 | 1 | Honda Civic | 0.0328947 |
33.9 | 4 | 65 | 1 | Toyota Corolla | 0.0294985 |
21.5 | 4 | 97 | 0 | Toyota Corona | 0.0465116 |
15.5 | 8 | 150 | 0 | Dodge Challenger | 0.0645161 |
15.2 | 8 | 150 | 0 | AMC Javelin | 0.0657895 |
13.3 | 8 | 245 | 0 | Camaro Z28 | 0.0751880 |
19.2 | 8 | 175 | 0 | Pontiac Firebird | 0.0520833 |
27.3 | 4 | 66 | 1 | Fiat X1-9 | 0.0366300 |
26.0 | 4 | 91 | 1 | Porsche 914-2 | 0.0384615 |
30.4 | 4 | 113 | 1 | Lotus Europa | 0.0328947 |
15.8 | 8 | 264 | 1 | Ford Pantera L | 0.0632911 |
19.7 | 6 | 175 | 1 | Ferrari Dino | 0.0507614 |
15.0 | 8 | 335 | 1 | Maserati Bora | 0.0666667 |
21.4 | 4 | 109 | 1 | Volvo 142E | 0.0467290 |
df$gpm <- NULL
tb <- select(tb, -gpm)
dt[ , gpm := NULL, ]
mpg | cyl | hp | am | car |
---|---|---|---|---|
21.0 | 6 | 110 | 1 | Mazda RX4 |
21.0 | 6 | 110 | 1 | Mazda RX4 Wag |
22.8 | 4 | 93 | 1 | Datsun 710 |
21.4 | 6 | 110 | 0 | Hornet 4 Drive |
18.7 | 8 | 175 | 0 | Hornet Sportabout |
18.1 | 6 | 105 | 0 | Valiant |
14.3 | 8 | 245 | 0 | Duster 360 |
24.4 | 4 | 62 | 0 | Merc 240D |
22.8 | 4 | 95 | 0 | Merc 230 |
19.2 | 6 | 123 | 0 | Merc 280 |
17.8 | 6 | 123 | 0 | Merc 280C |
16.4 | 8 | 180 | 0 | Merc 450SE |
17.3 | 8 | 180 | 0 | Merc 450SL |
15.2 | 8 | 180 | 0 | Merc 450SLC |
10.4 | 8 | 205 | 0 | Cadillac Fleetwood |
10.4 | 8 | 215 | 0 | Lincoln Continental |
14.7 | 8 | 230 | 0 | Chrysler Imperial |
32.4 | 4 | 66 | 1 | Fiat 128 |
30.4 | 4 | 52 | 1 | Honda Civic |
33.9 | 4 | 65 | 1 | Toyota Corolla |
21.5 | 4 | 97 | 0 | Toyota Corona |
15.5 | 8 | 150 | 0 | Dodge Challenger |
15.2 | 8 | 150 | 0 | AMC Javelin |
13.3 | 8 | 245 | 0 | Camaro Z28 |
19.2 | 8 | 175 | 0 | Pontiac Firebird |
27.3 | 4 | 66 | 1 | Fiat X1-9 |
26.0 | 4 | 91 | 1 | Porsche 914-2 |
30.4 | 4 | 113 | 1 | Lotus Europa |
15.8 | 8 | 264 | 1 | Ford Pantera L |
19.7 | 6 | 175 | 1 | Ferrari Dino |
15.0 | 8 | 335 | 1 | Maserati Bora |
21.4 | 4 | 109 | 1 | Volvo 142E |
summarise(tb, mpg = median(mpg), hp = median(hp))
dt[ , .(mpg = median(mpg), hp = mean(hp)) , ]
mpg | hp |
---|---|
19.2 | 123 |
NOTE: The base
method returns a vector while the tidyverse
and data.table
methods return a table.
summarise_each(tb, max)
dt[ , lapply(.SD, max) , ]
mpg | cyl | hp | am | car |
---|---|---|---|---|
33.9 | 8 | 335 | 1 | Volvo 142E |
group_by(tb, cyl) %>% summarise_each(max)
dt[ , lapply(.SD, max), by = cyl ]
cyl | mpg | hp | am | car |
---|---|---|---|---|
6 | 21.4 | 175 | 1 | Valiant |
4 | 33.9 | 113 | 1 | Volvo 142E |
8 | 19.2 | 335 | 1 | Pontiac Firebird |
df.l <- reshape(df, idvar = "car", times = names(df)[names(df) != "car"], timevar = "variable", v.names="value", varying = list(names(df) [names(df) != "car"]), direction = "long")
tb.l <- pivot_longer(tb, names_to = "variable", values_to = "value", -car)
dt.l <- melt(dt, id.vars = c("car"), variable.name = "variable", value.name = "value")
car | variable | value |
---|---|---|
Mazda RX4 | mpg | 21.0 |
Mazda RX4 | cyl | 6.0 |
Mazda RX4 | hp | 110.0 |
Mazda RX4 | am | 1.0 |
Mazda RX4 Wag | mpg | 21.0 |
Mazda RX4 Wag | cyl | 6.0 |
Mazda RX4 Wag | hp | 110.0 |
Mazda RX4 Wag | am | 1.0 |
Datsun 710 | mpg | 22.8 |
Datsun 710 | cyl | 4.0 |
Datsun 710 | hp | 93.0 |
Datsun 710 | am | 1.0 |
Hornet 4 Drive | mpg | 21.4 |
Hornet 4 Drive | cyl | 6.0 |
Hornet 4 Drive | hp | 110.0 |
Hornet 4 Drive | am | 0.0 |
Hornet Sportabout | mpg | 18.7 |
Hornet Sportabout | cyl | 8.0 |
Hornet Sportabout | hp | 175.0 |
Hornet Sportabout | am | 0.0 |
Valiant | mpg | 18.1 |
Valiant | cyl | 6.0 |
Valiant | hp | 105.0 |
Valiant | am | 0.0 |
Duster 360 | mpg | 14.3 |
Duster 360 | cyl | 8.0 |
Duster 360 | hp | 245.0 |
Duster 360 | am | 0.0 |
Merc 240D | mpg | 24.4 |
Merc 240D | cyl | 4.0 |
Merc 240D | hp | 62.0 |
Merc 240D | am | 0.0 |
Merc 230 | mpg | 22.8 |
Merc 230 | cyl | 4.0 |
Merc 230 | hp | 95.0 |
Merc 230 | am | 0.0 |
Merc 280 | mpg | 19.2 |
Merc 280 | cyl | 6.0 |
Merc 280 | hp | 123.0 |
Merc 280 | am | 0.0 |
Merc 280C | mpg | 17.8 |
Merc 280C | cyl | 6.0 |
Merc 280C | hp | 123.0 |
Merc 280C | am | 0.0 |
Merc 450SE | mpg | 16.4 |
Merc 450SE | cyl | 8.0 |
Merc 450SE | hp | 180.0 |
Merc 450SE | am | 0.0 |
Merc 450SL | mpg | 17.3 |
Merc 450SL | cyl | 8.0 |
Merc 450SL | hp | 180.0 |
Merc 450SL | am | 0.0 |
Merc 450SLC | mpg | 15.2 |
Merc 450SLC | cyl | 8.0 |
Merc 450SLC | hp | 180.0 |
Merc 450SLC | am | 0.0 |
Cadillac Fleetwood | mpg | 10.4 |
Cadillac Fleetwood | cyl | 8.0 |
Cadillac Fleetwood | hp | 205.0 |
Cadillac Fleetwood | am | 0.0 |
Lincoln Continental | mpg | 10.4 |
Lincoln Continental | cyl | 8.0 |
Lincoln Continental | hp | 215.0 |
Lincoln Continental | am | 0.0 |
Chrysler Imperial | mpg | 14.7 |
Chrysler Imperial | cyl | 8.0 |
Chrysler Imperial | hp | 230.0 |
Chrysler Imperial | am | 0.0 |
Fiat 128 | mpg | 32.4 |
Fiat 128 | cyl | 4.0 |
Fiat 128 | hp | 66.0 |
Fiat 128 | am | 1.0 |
Honda Civic | mpg | 30.4 |
Honda Civic | cyl | 4.0 |
Honda Civic | hp | 52.0 |
Honda Civic | am | 1.0 |
Toyota Corolla | mpg | 33.9 |
Toyota Corolla | cyl | 4.0 |
Toyota Corolla | hp | 65.0 |
Toyota Corolla | am | 1.0 |
Toyota Corona | mpg | 21.5 |
Toyota Corona | cyl | 4.0 |
Toyota Corona | hp | 97.0 |
Toyota Corona | am | 0.0 |
Dodge Challenger | mpg | 15.5 |
Dodge Challenger | cyl | 8.0 |
Dodge Challenger | hp | 150.0 |
Dodge Challenger | am | 0.0 |
AMC Javelin | mpg | 15.2 |
AMC Javelin | cyl | 8.0 |
AMC Javelin | hp | 150.0 |
AMC Javelin | am | 0.0 |
Camaro Z28 | mpg | 13.3 |
Camaro Z28 | cyl | 8.0 |
Camaro Z28 | hp | 245.0 |
Camaro Z28 | am | 0.0 |
Pontiac Firebird | mpg | 19.2 |
Pontiac Firebird | cyl | 8.0 |
Pontiac Firebird | hp | 175.0 |
Pontiac Firebird | am | 0.0 |
Fiat X1-9 | mpg | 27.3 |
Fiat X1-9 | cyl | 4.0 |
Fiat X1-9 | hp | 66.0 |
Fiat X1-9 | am | 1.0 |
Porsche 914-2 | mpg | 26.0 |
Porsche 914-2 | cyl | 4.0 |
Porsche 914-2 | hp | 91.0 |
Porsche 914-2 | am | 1.0 |
Lotus Europa | mpg | 30.4 |
Lotus Europa | cyl | 4.0 |
Lotus Europa | hp | 113.0 |
Lotus Europa | am | 1.0 |
Ford Pantera L | mpg | 15.8 |
Ford Pantera L | cyl | 8.0 |
Ford Pantera L | hp | 264.0 |
Ford Pantera L | am | 1.0 |
Ferrari Dino | mpg | 19.7 |
Ferrari Dino | cyl | 6.0 |
Ferrari Dino | hp | 175.0 |
Ferrari Dino | am | 1.0 |
Maserati Bora | mpg | 15.0 |
Maserati Bora | cyl | 8.0 |
Maserati Bora | hp | 335.0 |
Maserati Bora | am | 1.0 |
Volvo 142E | mpg | 21.4 |
Volvo 142E | cyl | 4.0 |
Volvo 142E | hp | 109.0 |
Volvo 142E | am | 1.0 |
NOTE 1: The output order will differ between these methods.
NOTE 2: The base
reshape function creates row names from the car
variable. We’ll remove the row names to match the other table’s structure.
df.w <- reshape(df.l, idvar = "car", timevar="variable", v.names="value", direction = "wide")
tb.w <- pivot_wider(tb.l, names_from = variable, values_from = value)
dt.w <- dcast(dt.l, car ~ variable, value.var="value")
car | mpg | cyl | hp | am |
---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 110 | 1 |
Mazda RX4 Wag | 21.0 | 6 | 110 | 1 |
Datsun 710 | 22.8 | 4 | 93 | 1 |
Hornet 4 Drive | 21.4 | 6 | 110 | 0 |
Hornet Sportabout | 18.7 | 8 | 175 | 0 |
Valiant | 18.1 | 6 | 105 | 0 |
Duster 360 | 14.3 | 8 | 245 | 0 |
Merc 240D | 24.4 | 4 | 62 | 0 |
Merc 230 | 22.8 | 4 | 95 | 0 |
Merc 280 | 19.2 | 6 | 123 | 0 |
Merc 280C | 17.8 | 6 | 123 | 0 |
Merc 450SE | 16.4 | 8 | 180 | 0 |
Merc 450SL | 17.3 | 8 | 180 | 0 |
Merc 450SLC | 15.2 | 8 | 180 | 0 |
Cadillac Fleetwood | 10.4 | 8 | 205 | 0 |
Lincoln Continental | 10.4 | 8 | 215 | 0 |
Chrysler Imperial | 14.7 | 8 | 230 | 0 |
Fiat 128 | 32.4 | 4 | 66 | 1 |
Honda Civic | 30.4 | 4 | 52 | 1 |
Toyota Corolla | 33.9 | 4 | 65 | 1 |
Toyota Corona | 21.5 | 4 | 97 | 0 |
Dodge Challenger | 15.5 | 8 | 150 | 0 |
AMC Javelin | 15.2 | 8 | 150 | 0 |
Camaro Z28 | 13.3 | 8 | 245 | 0 |
Pontiac Firebird | 19.2 | 8 | 175 | 0 |
Fiat X1-9 | 27.3 | 4 | 66 | 1 |
Porsche 914-2 | 26.0 | 4 | 91 | 1 |
Lotus Europa | 30.4 | 4 | 113 | 1 |
Ford Pantera L | 15.8 | 8 | 264 | 1 |
Ferrari Dino | 19.7 | 6 | 175 | 1 |
Maserati Bora | 15.0 | 8 | 335 | 1 |
Volvo 142E | 21.4 | 4 | 109 | 1 |
We’ll first create a lookup table that pairs up the am
variable values with a text descriptor.
x | y |
---|---|
0 | automatic |
1 | manual |
Next, we join the tables.
merge(df, df.lu, by.x = "am", by.y="x", all.x = TRUE)
left_join(tb, tb.lu, by = c("am" = "x"))
dt[dt.lu, on = c("am" = "x")]
am | mpg | cyl | hp | car | y |
---|---|---|---|---|---|
0 | 18.7 | 8 | 175 | Hornet Sportabout | automatic |
0 | 22.8 | 4 | 95 | Merc 230 | automatic |
0 | 21.4 | 6 | 110 | Hornet 4 Drive | automatic |
0 | 17.3 | 8 | 180 | Merc 450SL | automatic |
0 | 18.1 | 6 | 105 | Valiant | automatic |
0 | 14.3 | 8 | 245 | Duster 360 | automatic |
0 | 24.4 | 4 | 62 | Merc 240D | automatic |
0 | 14.7 | 8 | 230 | Chrysler Imperial | automatic |
0 | 19.2 | 6 | 123 | Merc 280 | automatic |
0 | 17.8 | 6 | 123 | Merc 280C | automatic |
0 | 16.4 | 8 | 180 | Merc 450SE | automatic |
0 | 21.5 | 4 | 97 | Toyota Corona | automatic |
0 | 15.2 | 8 | 180 | Merc 450SLC | automatic |
0 | 10.4 | 8 | 205 | Cadillac Fleetwood | automatic |
0 | 10.4 | 8 | 215 | Lincoln Continental | automatic |
0 | 19.2 | 8 | 175 | Pontiac Firebird | automatic |
0 | 15.5 | 8 | 150 | Dodge Challenger | automatic |
0 | 15.2 | 8 | 150 | AMC Javelin | automatic |
0 | 13.3 | 8 | 245 | Camaro Z28 | automatic |
1 | 21.0 | 6 | 110 | Mazda RX4 | manual |
1 | 21.0 | 6 | 110 | Mazda RX4 Wag | manual |
1 | 22.8 | 4 | 93 | Datsun 710 | manual |
1 | 33.9 | 4 | 65 | Toyota Corolla | manual |
1 | 15.8 | 8 | 264 | Ford Pantera L | manual |
1 | 32.4 | 4 | 66 | Fiat 128 | manual |
1 | 30.4 | 4 | 52 | Honda Civic | manual |
1 | 30.4 | 4 | 113 | Lotus Europa | manual |
1 | 27.3 | 4 | 66 | Fiat X1-9 | manual |
1 | 26.0 | 4 | 91 | Porsche 914-2 | manual |
1 | 21.4 | 4 | 109 | Volvo 142E | manual |
1 | 19.7 | 6 | 175 | Ferrari Dino | manual |
1 | 15.0 | 8 | 335 | Maserati Bora | manual |
Base R does not have native chaining structure like the tidyverse
or data.table.
Instead, it relies on a series of standalone operations. However, one technique that can be adopted to make it seem as though a series of operations are to be run in unison is to pass each intermediate steps to the .
object as in . <- mean(x)
.
The tidyverse
adopts margeritr
’s pipe, %>%
, to piece operations together.
The data.table
pieces bracketed operations back to back as in [...][...]...
. Note that if you are to split data.table operations across multiple lines, you must split the line in the middle of an open bracket as in:
[ ,do something>][
do another thing][
and so on]
In this example, we’ll first create a new variable, gpm
, then we’ll compute the median gpm
value grouped by cyl
before sorting the summary in descending order.
df$gpm <- 1/df$mpg . <- df[ , c("cyl", "gpm")] . <- aggregate(., list(df$cyl), median) .$Group.1 <- NULL .[order(-.$gpm), ]
tb %>% mutate(gpm = 1/mpg) %>% group_by(cyl) %>% summarise(gpm = median(gpm)) %>% arrange(-gpm)
dt[ , gpm := 1/mpg, ][ order(-gpm), .(gpm = median(gpm)), by = cyl]
cyl | gpm | |
---|---|---|
3 | 8 | 0.0657895 |
2 | 6 | 0.0507614 |
1 | 4 | 0.0384615 |
Manuel Gimond, 2019