This tutorial makes use of the following R package(s): dplyr

We can use dplyr’s join operations to join elelments from one table to another table. Four such functions are left_join, right_join, inner_join, and full join.

To demonstrate these functions, we’ll work off of two dataframes: df (the data frame we are joining to) and dj (the table being joined to df).

library(dplyr)

df <- data.frame( x = c(1, 23, 4, 43, 2, 17),
                  y = c("a", "b", "b", "b", "a", "d"))

dj <- data.frame( z = c("apple", "pear", "orange"),
                  y = c("a", "b", "c"))

We will join both tables by the common column y.

Left join

If a join element in df does not exist in dj, NA will be assigned to column z.

left_join(df, dj, by="y")
   x y     z
1  1 a apple
2 23 b  pear
3  4 b  pear
4 43 b  pear
5  2 a apple
6 17 d  <NA>
left join figure

left join figure

Right join

If a join element in df does not exist in dj, that element is removed from the output. Also note that all elements in dj appear in the output (even if they don’t have a match in df in which case an NA value is added). Note too that the order matches the order in which the y elements appear in dj.

right_join(df, dj, by="y")
   x y      z
1  1 a  apple
2  2 a  apple
3 23 b   pear
4  4 b   pear
5 43 b   pear
6 NA c orange
right join figure

right join figure

Inner join

Only matching elements in df and dj are saved in the output.

inner_join(df, dj, by="y")
   x y     z
1  1 a apple
2 23 b  pear
3  4 b  pear
4 43 b  pear
5  2 a apple
inner join figure

inner join figure

Full join

All elements in df and dj are present in the output. For non-matching pairs, NA values are supplied.

full_join(df, dj, by="y")
   x y      z
1  1 a  apple
2 23 b   pear
3  4 b   pear
4 43 b   pear
5  2 a  apple
6 17 d   <NA>
7 NA c orange
full join figure

full join figure