Colby R User Group

What’s new in R 4.1

What’s new in R 4.1

While piping operations are not uncommon in many programming environments, piping has only recently found its way into the R programming environment by way of Stefan Milton Bache’s `magrittr`

package (now part of the tidyverse suite of packages). Its infix operator is written as `%>%`

.

Take the following series of operations:

```
<- subset(mtcars, select = c(hp, mpg))
dat1 summary(dat1)
```

```
## hp mpg
## Min. : 52.0 Min. :10.40
## 1st Qu.: 96.5 1st Qu.:15.43
## Median :123.0 Median :19.20
## Mean :146.7 Mean :20.09
## 3rd Qu.:180.0 3rd Qu.:22.80
## Max. :335.0 Max. :33.90
```

The `mtcars`

dataframe is going through two operations: a table subset, then a summary operation. This approach requires that an intermediate object be created.

A more succinct chunk would look like this:

`summary( subset(mtcars, select = c(hp, mpg)))`

```
## hp mpg
## Min. : 52.0 Min. :10.40
## 1st Qu.: 96.5 1st Qu.:15.43
## Median :123.0 Median :19.20
## Mean :146.7 Mean :20.09
## 3rd Qu.:180.0 3rd Qu.:22.80
## Max. :335.0 Max. :33.90
```

However, we are trading readability for succinctness.

A compromise between the two using the pipe looks like this:

```
library(magrittr)
%>%
mtcars subset(select = mpg:hp) %>%
summary()
```

```
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
```

This approach avoids the need for intermediate objects while offering an easy to follow workflow.

R version 4.1 introduces the new native pipe: `|>`

. It behaves much like `%>%`

, at least from the user’s perspective. So, the above code chunk can be written without relying on the `magrittr`

package as follows:

```
|>
mtcars subset(select = mpg:hp) |>
summary()
```

```
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
```

RStudio offers the shortcut key for the `magrittr`

pipe: `ctr + shift + M`

on Windows machines and `cmd + shift + M`

on Macs.

RStudio does not yet offer a dedicated shortcut key for the native pipe but it does offer the option to choose which pipe to assign to that shortcut key. This option can be specified via the *Options* menu (note that as of this writing, this feature is only available in the preview version of RStudio).

A pipe feeds the contents (or output) from the left hand side (LHS) into the first *unnamed* argument of the right hand side (RHS) function. So in the following example, the pipe feeds the `mtcars`

dataframe into the first argument of `subset()`

.

`|> subset(select = mpg:hp) mtcars `

The first argument in `subset`

is the data object argument, `x`

. Note that `subset`

has several methods. If a dataframe is passed to `subset`

, the method called is `subset.data.frame()`

. We can list its arguments using the following command.

`formalArgs(subset.data.frame)`

`## [1] "x" "subset" "select" "drop" "..."`

The first argument is `x =`

(the input dataframe). So in the above piping operation `mtcars`

is piped as a parameter to the `x`

argument of the `subset`

function.

Knowing this can help troubleshoot unwelcome scenarios. For example, what happens if the LHS gets piped to a function on the RHS that does not have input data as its first argument?

`|> lm(hp ~ mpg ) mtcars `

`## Error in as.data.frame.default(data): cannot coerce class '"formula"' to a data.frame`

`lm`

has its data input argument, `data`

, as its second argument. Hence, the pipe is assigning `mtcars`

to `formula`

which is the first argument in the `lm`

function.

`formalArgs(lm)`

```
## [1] "formula" "data" "subset" "weights" "na.action"
## [6] "method" "model" "x" "y" "qr"
## [11] "singular.ok" "contrasts" "offset" "..."
```

You’ll note that we defined the formula, `hp ~ mpg`

, in the above code chunk, however, it’s not being explicitly assigned to the `formula`

argument. So R is interpreting the above piping operation as:

`lm(formula = mtcars, data = hp ~ mpg)`

which generates an error message.

One solution is to explicitly **name** the `formula`

argument to prevent the pipe from assigning `mtcars`

to `formula`

:

`|> lm(formula = hp ~ mpg ) mtcars `

```
##
## Call:
## lm(formula = hp ~ mpg, data = mtcars)
##
## Coefficients:
## (Intercept) mpg
## 324.08 -8.83
```

In the above example, the `formula =`

argument is explicitly spelled out thus forcing the pipe to look for the next argument not explicitly named in the code chunk. Once found, it assigns the LHS as that argument’s parameter. In the above code chunk, this next argument is `data`

(which is what we want o pipe `mtcars`

into). This works with both `|>`

and `%>%`

.

In some cases, naming arguments (as demonstrated in the previous example) may not be suitable. For example the following plot function does not generate a scatter plot of `hp`

vs `mpg`

as we might have expected, even though we are explicitly naming the argument being assigned the `hp ~ mpg`

formula.

`|> plot(formula = hp ~ mpg) mtcars `