Colby R User Group--Manny Gimond--8.30.2017
The Tidyverse
There are more than 10,000 packages on CRAN!
(This does not include those available via the BioConductor repo)
“[It's] an opinionated collection of R packages designed for data science.”
Core set of packages include:
A single line of code,
library(tidyverse)
or,
library(tibble)
library(ggplot2)
library(dplyr)
library(tidyr)
library(readr)
library(purrr)
class(mtcars)
[1] "data.frame"
mtcars.t <- as.tibble(mtcars)
class(mtcars.t)
[1] "tbl_df" "tbl" "data.frame"
mtcars
mpg cyl disp hp drat wt qsec vs
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1
am gear carb
Mazda RX4 1 4 4
Mazda RX4 Wag 1 4 4
Datsun 710 1 4 1
Hornet 4 Drive 0 3 1
Hornet Sportabout 0 3 2
Valiant 0 3 1
Duster 360 0 3 4
Merc 240D 0 4 2
Merc 230 0 4 2
Merc 280 0 4 4
Merc 280C 0 4 4
Merc 450SE 0 3 3
Merc 450SL 0 3 3
Merc 450SLC 0 3 3
Cadillac Fleetwood 0 3 4
Lincoln Continental 0 3 4
Chrysler Imperial 0 3 4
Fiat 128 1 4 1
Honda Civic 1 4 2
Toyota Corolla 1 4 1
Toyota Corona 0 3 1
Dodge Challenger 0 3 2
AMC Javelin 0 3 2
Camaro Z28 0 3 4
Pontiac Firebird 0 3 2
Fiat X1-9 1 4 1
Porsche 914-2 1 5 2
Lotus Europa 1 5 2
Ford Pantera L 1 5 4
Ferrari Dino 1 5 6
Maserati Bora 1 5 8
Volvo 142E 1 4 2
mtcars.t
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am
* <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0
# ... with 22 more rows, and 2 more variables: gear <dbl>,
# carb <dbl>
mtcars$h
[1] 110 110 93 110 175 105 245 62 95 123 123 180 180 180
[15] 205 215 230 66 52 65 97 150 150 245 175 66 91 113
[29] 264 175 335 109
mtcars.t$h
NULL
mtcars[ , 1]
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8
[12] 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5
[23] 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
… returns a vector
mtcars.t[ , 1]
# A tibble: 32 x 1
mpg
<dbl>
1 21.0
2 21.0
3 22.8
4 21.4
5 18.7
6 18.1
7 14.3
8 24.4
9 22.8
10 19.2
# ... with 22 more rows
… returns a column
hist(mtcars[ , 1])
hist(mtcars.t[ , 1])
Error in hist.default(mtcars.t[, 1]): 'x' must be numeric
hist() is expecting a vector, not a one column table
data.frame(x = 1:5,
y = 11:15,
z = sqrt(x^2+y^2) )
Error in data.frame(x = 1:5, y = 11:15, z = sqrt(x^2 + y^2)): object 'x' not found
tibble(x = 1:5,
y = 11:15,
z = sqrt(x^2+y^2) )
# A tibble: 5 x 3
x y z
<int> <int> <dbl>
1 1 11 11.04536
2 2 12 12.16553
3 3 13 13.34166
4 4 14 14.56022
5 5 15 15.81139
df <- read.csv("FAO_grains_NA.csv")
class(df)
[1] "data.frame"
library(readr)
tb <- read_csv("FAO_grains_NA.csv")
class(tb)
[1] "tbl_df" "tbl" "data.frame"
summary(df)
Country Crop
Canada :730 Barley :208
United States of America:771 Maize :208
Oats :208
Rye :208
Buckwheat :200
Grain, mixed:104
(Other) :365
Information Year
Area harvested (Ha):752 Min. :1961
Yield (Hg/Ha) :749 1st Qu.:1974
Median :1987
Mean :1987
3rd Qu.:2000
Max. :2012
Value
Min. : 0
1st Qu.: 19551
Median : 47131
Mean : 1622720
3rd Qu.: 558070
Max. :35400000
Source
Calculated data :749
FAO data based on imputation methodology: 17
FAO estimate : 73
Official data :662
summary(tb)
Country Crop Information
Length:1501 Length:1501 Length:1501
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Year Value Source
Min. :1961 Min. : 0 Length:1501
1st Qu.:1974 1st Qu.: 19551 Class :character
Median :1987 Median : 47131 Mode :character
Mean :1987 Mean : 1622720
3rd Qu.:2000 3rd Qu.: 558070
Max. :2012 Max. :35400000
subset(mtcars, hp > 200)
mpg cyl disp hp drat wt qsec vs
Duster 360 14.3 8 360 245 3.21 3.570 15.84 0
Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0
Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0
Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0
Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0
Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0
Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0
am gear carb
Duster 360 0 3 4
Cadillac Fleetwood 0 3 4
Lincoln Continental 0 3 4
Chrysler Imperial 0 3 4
Camaro Z28 0 3 4
Ford Pantera L 1 5 4
Maserati Bora 1 5 8
WARNING: subset() is not without its flaws!
filter(mtcars, hp > 200)
mpg cyl disp hp drat wt qsec vs am gear carb
1 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
2 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
3 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
4 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
5 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
6 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
7 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
mtcars[ , c("disp","hp")]
disp hp
Mazda RX4 160.0 110
Mazda RX4 Wag 160.0 110
Datsun 710 108.0 93
Hornet 4 Drive 258.0 110
Hornet Sportabout 360.0 175
Valiant 225.0 105
Duster 360 360.0 245
Merc 240D 146.7 62
Merc 230 140.8 95
Merc 280 167.6 123
Merc 280C 167.6 123
Merc 450SE 275.8 180
Merc 450SL 275.8 180
Merc 450SLC 275.8 180
Cadillac Fleetwood 472.0 205
Lincoln Continental 460.0 215
Chrysler Imperial 440.0 230
Fiat 128 78.7 66
Honda Civic 75.7 52
Toyota Corolla 71.1 65
Toyota Corona 120.1 97
Dodge Challenger 318.0 150
AMC Javelin 304.0 150
Camaro Z28 350.0 245
Pontiac Firebird 400.0 175
Fiat X1-9 79.0 66
Porsche 914-2 120.3 91
Lotus Europa 95.1 113
Ford Pantera L 351.0 264
Ferrari Dino 145.0 175
Maserati Bora 301.0 335
Volvo 142E 121.0 109
select(mtcars, disp, hp)
disp hp
Mazda RX4 160.0 110
Mazda RX4 Wag 160.0 110
Datsun 710 108.0 93
Hornet 4 Drive 258.0 110
Hornet Sportabout 360.0 175
Valiant 225.0 105
Duster 360 360.0 245
Merc 240D 146.7 62
Merc 230 140.8 95
Merc 280 167.6 123
Merc 280C 167.6 123
Merc 450SE 275.8 180
Merc 450SL 275.8 180
Merc 450SLC 275.8 180
Cadillac Fleetwood 472.0 205
Lincoln Continental 460.0 215
Chrysler Imperial 440.0 230
Fiat 128 78.7 66
Honda Civic 75.7 52
Toyota Corolla 71.1 65
Toyota Corona 120.1 97
Dodge Challenger 318.0 150
AMC Javelin 304.0 150
Camaro Z28 350.0 245
Pontiac Firebird 400.0 175
Fiat X1-9 79.0 66
Porsche 914-2 120.3 91
Lotus Europa 95.1 113
Ford Pantera L 351.0 264
Ferrari Dino 145.0 175
Maserati Bora 301.0 335
Volvo 142E 121.0 109
mtcars$ratio <- mtcars$wt / mtcars$hp
head(mtcars,3)
mpg cyl disp hp drat wt qsec vs am gear
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4
carb ratio
Mazda RX4 4 0.02381818
Mazda RX4 Wag 4 0.02613636
Datsun 710 1 0.02494624
Each addition/computation requires its own line of code.
mtcars <- mutate(mtcars, ratio = wt / hp)
head(mtcars,3)
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
ratio
1 0.02381818
2 0.02613636
3 0.02494624
mtcars <- mutate(mtcars, ratio = wt / hp,
ratio2 = hp / disp)
head(mtcars,3)
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
ratio ratio2
1 0.02381818 0.6875000
2 0.02613636 0.6875000
3 0.02494624 0.8611111
aggregate(mtcars$mpg, by=list(mtcars$cyl), FUN=mean)
Group.1 x
1 4 26.66364
2 6 19.74286
3 8 15.10000
Each summary requires its own line of code.
group_by(mtcars, cyl) %>% summarise(mean(mpg))
# A tibble: 3 x 2
cyl `mean(mpg)`
<dbl> <dbl>
1 4 26.66364
2 6 19.74286
3 8 15.10000
group_by(mtcars, cyl) %>% summarise(mean(mpg), mean(hp))
# A tibble: 3 x 3
cyl `mean(mpg)` `mean(hp)`
<dbl> <dbl> <dbl>
1 4 26.66364 82.63636
2 6 19.74286 122.28571
3 8 15.10000 209.21429
library(lubridate)
y <- mdy("1/23/2016", "12/1/1901", "11/23/2016")
ifelse( year(y) != 2016, mdy(NA), y)
[1] 16823 NA 17128
ifelse does not respect data type (except for numeric and character)
library(lubridate)
y <- mdy("1/23/2016", "12/1/1901", "11/23/2016")
if_else( year(y) != 2016, mdy(NA), y)
[1] "2016-01-23" NA "2016-11-23"
x <- as.factor( c("banana", "pear", "apple"))
ifelse(x == "pear", "apple", x)
[1] "2" "apple" "1"
ifelse returns level number for factors
x <- as.factor( c( "banana", "pear", "apple"))
recode(x , "pear" = "apple")
[1] banana apple apple
Levels: apple banana
z <- c(1, -2, 102)
ifelse( z < 0, abs(z),
ifelse(z > 100, z - 100, z))
[1] 1 2 2
z <- c(1, -2, 102)
case_when( z < 0 ~ abs(z),
z > 100 ~ z -100,
TRUE ~ z)
[1] 1 2 2