The native pipe

Pipe background

While piping operations are not uncommon in many programming environments, piping has only recently found its way into the R programming environment by way of Stefan Milton Bache’s magrittr package (now part of the tidyverse suite of packages). Its infix operator is written as %>%.

So why bother with a pipe?

Take the following series of operations:

dat1 <- subset(mtcars, select = c(hp, mpg))
summary(dat1)
##        hp             mpg       
##  Min.   : 52.0   Min.   :10.40  
##  1st Qu.: 96.5   1st Qu.:15.43  
##  Median :123.0   Median :19.20  
##  Mean   :146.7   Mean   :20.09  
##  3rd Qu.:180.0   3rd Qu.:22.80  
##  Max.   :335.0   Max.   :33.90

The mtcars dataframe is going through two operations: a table subset, then a summary operation. This approach requires that an intermediate object be created.

A more succinct chunk would look like this:

summary( subset(mtcars, select = c(hp, mpg)))
##        hp             mpg       
##  Min.   : 52.0   Min.   :10.40  
##  1st Qu.: 96.5   1st Qu.:15.43  
##  Median :123.0   Median :19.20  
##  Mean   :146.7   Mean   :20.09  
##  3rd Qu.:180.0   3rd Qu.:22.80  
##  Max.   :335.0   Max.   :33.90

However, we are trading readability for succinctness.

A compromise between the two using the pipe looks like this:

library(magrittr)
mtcars %>% 
  subset(select = mpg:hp) %>%
  summary()
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0

This approach avoids the need for intermediate objects while offering an easy to follow workflow.

A pipe was not native to R … until now

R version 4.1 introduces the new native pipe: |>. It behaves much like %>%, at least from the user’s perspective. So, the above code chunk can be written without relying on the magrittr package as follows:

mtcars |> 
  subset(select = mpg:hp) |> 
  summary()
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0

RStudio shortcut key

RStudio offers the shortcut key for the magrittr pipe: ctr + shift + M on Windows machines and cmd + shift + M on Macs.

RStudio does not yet offer a dedicated shortcut key for the native pipe but it does offer the option to choose which pipe to assign to that shortcut key. This option can be specified via the Options menu (note that as of this writing, this feature is only available in the preview version of RStudio).

What does a pipe do exactly?

A pipe feeds the contents (or output) from the left hand side (LHS) into the first unnamed argument of the right hand side (RHS) function. So in the following example, the pipe feeds the mtcars dataframe into the first argument of subset().

mtcars |> subset(select = mpg:hp)

The first argument in subset is the data object argument, x. Note that subset has several methods. If a dataframe is passed to subset, the method called is subset.data.frame(). We can list its arguments using the following command.

formalArgs(subset.data.frame)
## [1] "x"      "subset" "select" "drop"   "..."

The first argument is x = (the input dataframe). So in the above piping operation mtcars is piped as a parameter to the x argument of the subset function.

Knowing this can help troubleshoot unwelcome scenarios. For example, what happens if the LHS gets piped to a function on the RHS that does not have input data as its first argument?

mtcars |> lm(hp ~ mpg )
## Error in as.data.frame.default(data): cannot coerce class '"formula"' to a data.frame

lm has its data input argument, data, as its second argument. Hence, the pipe is assigning mtcars to formula which is the first argument in the lm function.

formalArgs(lm)
##  [1] "formula"     "data"        "subset"      "weights"     "na.action"  
##  [6] "method"      "model"       "x"           "y"           "qr"         
## [11] "singular.ok" "contrasts"   "offset"      "..."

You’ll note that we defined the formula, hp ~ mpg, in the above code chunk, however, it’s not being explicitly assigned to the formula argument. So R is interpreting the above piping operation as:

lm(formula = mtcars, data = hp ~ mpg)

which generates an error message.

One solution is to explicitly name the formula argument to prevent the pipe from assigning mtcars to formula:

mtcars |> lm(formula = hp ~ mpg )
## 
## Call:
## lm(formula = hp ~ mpg, data = mtcars)
## 
## Coefficients:
## (Intercept)          mpg  
##      324.08        -8.83

In the above example, the formula = argument is explicitly spelled out thus forcing the pipe to look for the next argument not explicitly named in the code chunk. Once found, it assigns the LHS as that argument’s parameter. In the above code chunk, this next argument is data (which is what we want o pipe mtcars into). This works with both |> and %>%.

Naming arguments may not always work

In some cases, naming arguments (as demonstrated in the previous example) may not be suitable. For example the following plot function does not generate a scatter plot of hp vs mpg as we might have expected, even though we are explicitly naming the argument being assigned the hp ~ mpg formula.

mtcars |>  plot(formula = hp ~ mpg)