You’ve already been exposed to a few examples of relational and boolean operations in earlier tutorials. A formal exploration of these techniques follow.

Relational operations play an important role in data manipulation. Anytime you subset a dataset based on one or more criterion, you are making use of a relational operation. The relational operators (also known as *logical binary operators*) include `==`

, `!=`

, `<`

, `<=`

, `>`

and `>=`

. The output of a condition is a logical vector `TRUE`

or `FALSE`

.

Relational operator | Syntax | Example |
---|---|---|

Exact equality | `==` |
3 == 4 -> FALSE |

Exact inequality | `!=` |
3 != 4 -> TRUE |

Less than | `<` |
3 < 4 -> TRUE |

Less than or equal | `<=` |
4 <= 4 -> TRUE |

Greater than | `>` |
3 > 4 -> FALSE |

Greater than or equal | `>=` |
4 >= 4 -> TRUE |

Boolean operations can be used to piece together multiple evaluations.

R has three boolean operators: The **AND** operator, `&`

; The **NOT** operator, `!`

; And the **OR** operator, `|`

.

The `&`

operator requires that the conditions on both sides of the boolean operator be satisfied. You would normally use this operator when addressing a question along the lines of *“ x must be satisfied AND y must be satisfied”*.

The `|`

operator requires that at least one condition be met on either side of the boolean operator. You would normally use this operator when addressing a question along the lines of “`x`

must be satisfied OR `y`

must be satisfied”. Note that the output will also be TRUE if *both* conditions are met.

The `!`

operator is a *negation* operator. It will reverse the outcome of a condition. It can be interpreted as *“I do NOT want x to be true”*. So if the outcome of an expression is

`TRUE`

, preceding that expression with `!`

will reverse the outcome to `FALSE`

and vice-versa.Boolean operator | Syntax | Example | Outcome |
---|---|---|---|

AND | `&` |
4 == 3 `&` 1 == 1 4 == 4 `&` 1 == 1 |
FALSE TRUE |

OR | `|` |
4 == 4 `|` 1 == 1 4 == 3 `|` 1 == 1 4 == 3 `|` 1 == 2 |
TRUE TRUE FALSE |

NOT | `!` |
`!` (4 == 3) `!` (4 == 4) |
TRUE FALSE |

The following table breaks down all possible Boolean outcomes where `T`

= `TRUE`

and `F`

= `FALSE`

:

Boolean operation | Outcome |
---|---|

T `&` T |
TRUE |

T `&` F |
FALSE |

F `&` F |
FALSE |

T `|` T |
TRUE |

T `|` F |
TRUE |

F `|` F |
FALSE |

`!` T |
FALSE |

`!` F |
TRUE |

If the input values to a boolean operation are numeric vectors and not logical vectors, the numeric values will be interpreted as `FALSE`

if zero and `TRUE`

otherwise. For example:

`1 & 2`

`[1] TRUE`

`1 & 0`

`[1] FALSE`

Note that the operation `a == (3 | 4)`

is **not** the same as `(a == 3) | (a == 4)`

. The former will return `FALSE`

whereas the latter will return `TRUE`

if `a = 3`

. This is because the Boolean operator evaluates both sides of its expression as separate **logical** outcomes (i.e. `T`

and `F`

values). In the latter case, the Boolean expression is asking *“is a equal to 3 OR is a equal to 4”*. Since one of the conditions is true, the expression ends up evaluating

`TRUE | FALSE`

which returns `TRUE`

(see above table).```
<- 3
a <- 4
b == 3) | (a == 4) (a
```

`[1] TRUE`

In the former expression, the boolean operator `|`

is evaluating `3`

OR `4`

on its right-hand side. As mentioned in the previous section, logical values take on a value of `0`

for FALSE and any non-zero value for TRUE, so when evaluating `3 | 4`

, it’s really seeing `TRUE | TRUE`

which, according to the aforementioned table will output `TRUE`

.

`3 | 4`

`[1] TRUE`

So in the end, the expression `a == (3 | 4)`

is really evaluating the condition `a == TRUE`

which returns false (since 3 is not equal to the logical value `TRUE`

).

`== (3 | 4) a `

`[1] FALSE`

The relational operators are used to compare single elements (i.e. one element at a time). If you want to compare two objects as a whole (e.g. multi-element vectors or data frames), use the `identical()`

function. For example:

```
<- c(1, 5, 6, 10)
a <- c(1, 5, 6)
b identical(a, a)
```

`[1] TRUE`

`identical(a, b)`

`[1] FALSE`

`identical(mtcars, mtcars)`

`[1] TRUE`

Notice that `identical`

returns a single logical vector, regardless the input object’s dimensions.

Note that the data structure must match as well as its element values. For example, if `d`

is a list and `a`

is an atomic vector, the output of `identical`

will be false even if the internal values match.

```
<- list( c(1, 5, 6, 10) )
d identical(a, d)
```

`[1] FALSE`

If we convert `d`

from a list to an atomic vector using the `unlist`

function (thus matching data structures), we get:

`identical(a, unlist(d))`

`[1] TRUE`

`%in%`

The match operator `%in%`

compares two sets of vectors and assesses if an element on the left-hand side of `%in%`

matches any of the elements on the right-hand side of the operator. For each element in the left-hand vector, R returns `TRUE`

if the value is present in any of the right-hand side elements or `FALSE`

if not.

For example, given the following vectors:

```
<- c( "a", "b", "cd", "fe")
v1 <- c( "b", "e") v2
```

find the elements in `v1`

that match any of the values in `v2`

.

`%in% v2 v1 `

`[1] FALSE TRUE FALSE FALSE`

The function checks whether each element in `v1`

has a matching value in `v2`

. For example, element `"a"`

in `v1`

is compared to elements `"b"`

and `"e"`

in `v2`

. No matches are found and a `FALSE`

is returned. The next element in `v1`

, `"b"`

, is compared to both elements in `v2`

. This time, there is a match (`v2`

has an element `"b"`

) and `TRUE`

is returned. This process is repeated for all elements in `v1`

.

The logical vector output has the same length as the input vector `v1`

(four in this example).

If we swap the vector objects, we get a two element logical vector since we are now comparing each element in `v2`

to any matching elements in `v1`

.

`%in% v1 v2 `

`[1] TRUE FALSE`

`NA`

When assessing if a value is equal to `NA`

the following evaluation may behave unexpectedly.

```
<- c (3, 67, 4, NA, 10)
a == NA a
```

`[1] NA NA NA NA NA`

The output is not a logical data type we would expect from an evaluation. Instead, you must make use of the `is.na()`

function:

`is.na(a)`

`[1] FALSE FALSE FALSE TRUE FALSE`

As another example, if we want to keep all rows in dataframe `d`

where `z`

= `NA`

, we would type:

```
<- data.frame(x = c(1,4,2,5,2,3,NA),
d y = c(3,2,5,3,8,1,1),
z = c(NA,NA,4,9,7,8,3))
is.na(d$z), ] d[
```

```
x y z
1 1 3 NA
2 4 2 NA
```

You can, of course, use the `!`

operator to reverse the evaluation and *omit* all rows where `z`

= `NA`

,

`!is.na(d$z), ] d[ `

```
x y z
3 2 5 4
4 5 3 9
5 2 8 7
6 3 1 8
7 NA 1 3
```