The native pipe

Pipe background

While piping operations are not uncommon in many programming environments, piping has only recently found its way into the R programming environment by way of Stefan Milton Bache’s magrittr package (now part of the tidyverse suite of packages). Its infix operator is written as %>%.

So why bother with a pipe?

Take the following series of operations:

dat1 <- subset(mtcars, select = c(hp, mpg))
summary(dat1)
##        hp             mpg       
##  Min.   : 52.0   Min.   :10.40  
##  1st Qu.: 96.5   1st Qu.:15.43  
##  Median :123.0   Median :19.20  
##  Mean   :146.7   Mean   :20.09  
##  3rd Qu.:180.0   3rd Qu.:22.80  
##  Max.   :335.0   Max.   :33.90

The mtcars dataframe is going through two operations: a table subset, then a summary operation. This approach requires that an intermediate object be created.

A more succinct chunk would look like this:

summary( subset(mtcars, select = c(hp, mpg)))
##        hp             mpg       
##  Min.   : 52.0   Min.   :10.40  
##  1st Qu.: 96.5   1st Qu.:15.43  
##  Median :123.0   Median :19.20  
##  Mean   :146.7   Mean   :20.09  
##  3rd Qu.:180.0   3rd Qu.:22.80  
##  Max.   :335.0   Max.   :33.90

However, we are trading readability for succinctness.

A compromise between the two using the pipe looks like this:

library(magrittr)
mtcars %>% 
  subset(select = mpg:hp) %>%
  summary()
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0

This approach avoids the need for intermediate objects while offering an easy to follow workflow.

A pipe was not native to R … until now

R version 4.1 introduces the new native pipe: |>. It behaves much like %>%, at least from the user’s perspective. So, the above code chunk can be written without relying on the magrittr package as follows:

mtcars |> 
  subset(select = mpg:hp) |> 
  summary()
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0

RStudio shortcut key

RStudio offers the shortcut key for the magrittr pipe: ctr + shift + M on Windows machines and cmd + shift + M on Macs.

RStudio does not yet offer a dedicated shortcut key for the native pipe but it does offer the option to choose which pipe to assign to that shortcut key. This option can be specified via the Options menu (note that as of this writing, this feature is only available in the preview version of RStudio).

What does a pipe do exactly?

A pipe feeds the contents (or output) from the left hand side (LHS) into the first unnamed argument of the right hand side (RHS) function. So in the following example, the pipe feeds the mtcars dataframe into the first argument of subset().

mtcars |> subset(select = mpg:hp)

The first argument in subset is the data object argument, x. Note that subset has several methods. If a dataframe is passed to subset, the method called is subset.data.frame(). We can list its arguments using the following command.

formalArgs(subset.data.frame)
## [1] "x"      "subset" "select" "drop"   "..."

The first argument is x = (the input dataframe). So in the above piping operation mtcars is piped as a parameter to the x argument of the subset function.

Knowing this can help troubleshoot unwelcome scenarios. For example, what happens if the LHS gets piped to a function on the RHS that does not have input data as its first argument?

mtcars |> lm(hp ~ mpg )
## Error in as.data.frame.default(data): cannot coerce class '"formula"' to a data.frame

lm has its data input argument, data, as its second argument. Hence, the pipe is assigning mtcars to formula which is the first argument in the lm function.

formalArgs(lm)
##  [1] "formula"     "data"        "subset"      "weights"     "na.action"  
##  [6] "method"      "model"       "x"           "y"           "qr"         
## [11] "singular.ok" "contrasts"   "offset"      "..."

You’ll note that we defined the formula, hp ~ mpg, in the above code chunk, however, it’s not being explicitly assigned to the formula argument. So R is interpreting the above piping operation as:

lm(formula = mtcars, data = hp ~ mpg)

which generates an error message.

One solution is to explicitly name the formula argument to prevent the pipe from assigning mtcars to formula:

mtcars |> lm(formula = hp ~ mpg )
## 
## Call:
## lm(formula = hp ~ mpg, data = mtcars)
## 
## Coefficients:
## (Intercept)          mpg  
##      324.08        -8.83

In the above example, the formula = argument is explicitly spelled out thus forcing the pipe to look for the next argument not explicitly named in the code chunk. Once found, it assigns the LHS as that argument’s parameter. In the above code chunk, this next argument is data (which is what we want o pipe mtcars into). This works with both |> and %>%.

Naming arguments may not always work

In some cases, naming arguments (as demonstrated in the previous example) may not be suitable. For example the following plot function does not generate a scatter plot of hp vs mpg as we might have expected, even though we are explicitly naming the argument being assigned the hp ~ mpg formula.

mtcars |>  plot(formula = hp ~ mpg)

While the above does not generate an error, it’s not generating the desired plot (i.e. a single scatter plot of hp vs. mpg).

Even though the generic plot function accepts a formula, it does not have formula as an argument:

args(plot)
## function (x, y, ...) 
## NULL

So plot ignores the formula = hp ~ mpg argument in our code chunk and is, in essence, running the code plot(mtcars) which will generate scatter plot matrices for all combinations of paired variables in the data.

So why will plot accept a formula and yet not recognize the formula argument? Being a generic method, plot will pass the arguments to the plot method it thinks is needed given the argument type. Here, the plot method needed for a formula is graphics:::plot.formula. So, to make use of a named argument, you would need to modify the previous chunk by specifying the plot.formula method as follows:

mtcars |>  graphics:::plot.formula(formula = hp ~ mpg)

This approach to solving named argument roadblocks can be time consuming and lead to frustration. A few (simpler) solutions are presented next.

%>% offers the placeholder ., |> does not

One notable difference between |> and %>% is the lack of a placeholder. Magritter’s %>% offers the . placeholder which can be used to explicitly specify where the LHS is to be placed in the RHS’s function. For example, to circumvent the missing formula argument from the generic plot function, you could place a . in the plot function where you would want the LHS to be piped into. For example:

  mtcars %>% plot( hp ~ mpg, data = . )

Note that the only argument being named is data–the argument to receive the LHS.

The native pipe does not have a placeholder. This is to maintain its “viable syntax transformation”.

A solution that will work with |> (and one that also works with %>%) is the embedding an anonymous function.

Using anonymous functions in pipes

An anonymous function is a function that is not assigned a name. For example, the following function is a named function.

my_fun <- function(x) sqrt(x)

The above code chunk creates a function named my_fun(). Naming a function allows us to reuse this function anywhere in an R session. For example,

my_fun(20)
## [1] 4.472136
my_fun(3)
## [1] 1.732051

An anomalous function is only used once and is usually embedded inside other functions such as apply or its many variants. The structure of an anonymous function looks like:

(function(x) sqrt(x)) ()

Continuing with the plot function example, using an anonymous function to explicitly indicate where to place the LHS in the RHS function would look like:

  mtcars |> (function(x) plot(mpg ~ hp, data = x)) ()

Here, we explicitly define the placeholder name (x in the above example). But note that you could use any other accepted names, even the . character.

Anonymous functions also work with the %>% pipe.

Differences in performance between %>% and |>

Under the hood, the native pipe is distinctly different from its magrittr counterpart. %>% is a function while |> is not. This adds a small overhead to the %>% operation. |> is nothing more than a syntactic translation which means that R will parse 10 |> sqrt() as sqrt(10). On the other hand, 10 %>% sqrt is parsed as %>%(10, sqrt()), i.e. two functions are processed instead of one.

This overhead will not be noticeable to most users. But if you are running a series of piping operations in a loop, that overhead may have a measurable impact in performance. The following plot compares the performance between sqrt(10), 10 |> sqrt() and 10 %>% sqrt(). Each code is run 10 million times.

As expected, 10 |> sqrt()’s performance is identical to sqrt(10) (recall that |> is a simple syntax transformation and not a function).

The formula shorthand \

R 4.1 also introduces a shorthand for the function() function. This can help reduce code syntax. The shorthand notation may help keep lines of code short when implementing an anonymous function. For example, the following two lines of code perform the exact same operation.

mtcars |>  ( function(x)  plot(hp ~ mpg, x)) ()
mtcars |>  (        \(x)  plot(hp ~ mpg, x)) ()

The shorthand notation can also be used with named functions:

f1 <- function(x,y) x + y
f1 <-        \(x,y) x + y

However, the shorthand notation may impede readability–it’s easier to spot function than it is to spot \ when scanning for a formula definition in an R script.

New color palettes

R version 4.1 adds new categorical color palettes. Previous to 4.1, R offered the following categorical color palette:

# Before version 4.1
palette()
[1] "black"   "red"     "green3"  "blue"    "cyan"    "magenta" "yellow"  "gray" 

R version 4.1 offers a different set of colors that do a better job in preserving perceived consistency in lightness and saturation dimensions.

# Version 4.1 and later
palette()
## [1] "black"   "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710"
## [8] "gray62"

But R 4.1 offers additional categorical palettes for a total of 16 palettes. The palette names can be listed via the new palette.pals() function.

palette.pals()
##  [1] "R3"              "R4"              "ggplot2"         "Okabe-Ito"      
##  [5] "Accent"          "Dark 2"          "Paired"          "Pastel 1"       
##  [9] "Pastel 2"        "Set 1"           "Set 2"           "Set 3"          
## [13] "Tableau 10"      "Classic Tableau" "Polychrome 36"   "Alphabet"

To view the list of colors associated with a palette (e.g. the "Accent" palette), type the following:

palette("Accent")
palette()
## [1] "#7FC97F" "#BEAED4" "#FDC086" "#FFFF99" "#386CB0" "#F0027F" "#BF5B17"
## [8] "gray40"

Note that the first line of code in the above code chunk will change the default color palette to "Accent" for the current R session.

boxplot(log(decrease) ~ treatment, data = OrchardSprays,
        col = OrchardSprays$treatment)

If you want to revert the palette back to the default, set the palette name to "R4".

palette("R4")
palette()
## [1] "black"   "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710"
## [8] "gray62"
boxplot(log(decrease) ~ treatment, data = OrchardSprays,
        col = OrchardSprays$treatment)

If you want to replicate the default color palette available in R prior to version 4.1, set the palette name to "R3".

palette("R3")
palette()
## [1] "black"   "red"     "green3"  "blue"    "cyan"    "magenta" "yellow" 
## [8] "gray"
boxplot(log(decrease) ~ treatment, data = OrchardSprays,
        col = OrchardSprays$treatment)

The palettes in R 4.1 vary in the number of color swatches. The following plot shows all colors available for each palette.


Copyleft Manuel Gimond, 2021