7  Relational and boolean operations

version
R 4.3.2

You’ve already been exposed to a few examples of relational and boolean operations in earlier chapters. A formal exploration of these techniques follow.

7.1 Relational operations

Relational operations play an important role in data manipulation. Anytime you subset a dataset based on one or more criterion, you are making use of a relational operation. The relational operators (also known as logical binary operators) include ==, !=, <, <=, > and >=. The output of a condition is a logical vector TRUE or FALSE.

Relational operator Syntax Example
Exact equality == 3 == 4 -> FALSE
Exact inequality != 3 != 4 -> TRUE
Less than < 3 < 4 -> TRUE
Less than or equal <= 4 <= 4 -> TRUE
Greater than > 3 > 4 -> FALSE
Greater than or equal >= 4 >= 4 -> TRUE

7.2 Boolean operations

Boolean operations can be used to piece together multiple evaluations.

R has three boolean operators: The AND operator, &; The NOT operator, !; And the OR operator, |.

The & operator requires that the conditions on both sides of the boolean operator be satisfied. You would normally use this operator when addressing a condition along the lines of x must be satisfied AND y must be satisfied”.

The | operator requires that at least one condition be met on either side of the boolean operator. You would normally use this operator when addressing a condition along the lines of “x must be satisfied OR y must be satisfied”. Note that the output will also be TRUE if both conditions are met.

The ! operator is a negation operator. It will reverse the outcome of a condition. So if the outcome of an expression is TRUE, preceding that expression with ! will reverse the outcome to FALSE and vice-versa.

Boolean operator Syntax Example Outcome
AND &

4 == 3 & 1 == 1

4 == 4 & 1 == 1

FALSE

TRUE

OR |

4 == 4 | 1 == 1

4 == 3 | 1 == 1

4 == 3 | 1 == 2

TRUE

TRUE

FALSE

NOT !

!(4 == 3)

!(4 == 4)

TRUE

FALSE

The following table breaks down all possible Boolean outcomes where T = TRUE and F = FALSE:

Boolean operation Outcome
T & T TRUE
T & F FALSE
F & F FALSE
T | T TRUE
T | F TRUE
F | F FALSE
!T FALSE
!F TRUE

If the input values to a boolean operation are numeric vectors and not logical vectors, the numeric values will be interpreted as FALSE if zero and TRUE otherwise. For example:

1 & 2
[1] TRUE
1 & 0
[1] FALSE

7.2.1 Pecking order in operations

Note that the operation a == (3 | 4) is not the same as (a == 3) | (a == 4). If, for example, a = 3, the former will return FALSE whereas the latter will return TRUE.

a <- 3
a == (3 | 4)
[1] FALSE
(a == 3) | (a == 4)
[1] TRUE

This is because R applies a pecking order to its operations. In the former case, R is first evaluating what is in between the parentheses, (3 | 4).

(3 | 4)
[1] TRUE

This returns TRUE since the numbers on either side of | are converted to TRUE (only values of 0 are converted to FALSE). It then compares a to this logical vector TRUE.

a == TRUE
[1] FALSE

Here, the == operator requires that both sides of the operation be of the same data type. a is numeric and TRUE is logical. Recall from Chapter 3 that R circumvents differences in data types by coercing all values to the highest common mode (see the chapter on data types). Here, numeric overrides logical type thus coercing the TRUE variable to its numeric data type representation, 1. Hence, the evaluation being performed is:

a == 1
[1] FALSE

When a vector is evaluated for more than one condition, you need to explicitly break down each condition before combining them with boolean operators.

(a == 3) | (a == 4)
[1] TRUE

The above is an example of R’s built-in operation precedence rules. For example, comparison operations such as <= and > are performed before boolean operations such that a == 3 | 4 will first evaluate a == 3 before evaluating ... | 4.

Even boolean operations follow a pecking order such that ! precedes & which precedes |. For example:

! TRUE & FALSE | TRUE

will first evaluate ! TRUE, then ... & FALSE, then ... | TRUE.

To overrride R’s built-in precedence, use parentheses. For example:

! TRUE & (FALSE | TRUE)`

will first evaluate (FALSE | TRUE) and ! TRUE separately, then their output will be combined with ... & ....

For a full list of operation precedence, access the help page for Syntax.

?Syntax

The following lists the pecking order from high to low precedence (i.e. top operation is performed before bottom operation).

:: ::: access variables in a namespace
$ @ component / slot extraction
[ [[ indexing
^ exponentiation (right to left)
- + unary minus and plus
: sequence operator
%any% |> specialoperators (including %% and %/%)
* / multiply, divide
+ - (binary) add, subtract
< > <= >= == != ordering and comparison
! negation
& && and
| || or
~ as in formulae
-> ->> rightwards assignment
<- <<- assignment (right to left)
= assignment (right to left)
? help

7.3 Comparing multidimensional objects

The relational operators are used to compare single elements (i.e. one element at a time). If you want to compare two objects as a whole (e.g. multi-element vectors or data frames), use the identical() function. For example:

a <- c(1, 5, 6, 10)
b <- c(1, 5, 6)
identical(a, a)
[1] TRUE
identical(a, b)
[1] FALSE
identical(mtcars, mtcars)
[1] TRUE

Notice that identical returns a single logical vector, regardless the input object’s dimensions.

Note that the data structure must match as well as its element values. For example, if d is a list and a is an atomic vector, the output of identical will be false even if the internal values match.

d <- list( c(1, 5, 6, 10) )
identical(a, d)
[1] FALSE

If we convert d from a list to an atomic vector using the unlist function (thus matching data structures), we get:

identical(a, unlist(d))
[1] TRUE

7.4 The match operator %in%

The match operator %in% compares two sets of vectors and assesses if an element on the left-hand side of %in% matches any of the elements on the right-hand side of the operator. For each element in the left-hand vector, R returns TRUE if the value is present in any of the right-hand side elements or FALSE if not.

For example, given the following vectors:

v1 <- c( "a", "b", "cd", "fe")
v2 <- c( "b", "e")

find the elements in v1 that match any of the values in v2.

v1 %in% v2
[1] FALSE  TRUE FALSE FALSE

The function checks whether each element in v1 has a matching value in v2. For example, element "a" in v1 is compared to elements "b" and "e" in v2. No matches are found and a FALSE is returned. The next element in v1, "b", is compared to both elements in v2. This time, there is a match (v2 has an element "b") and TRUE is returned. This process is repeated for all elements in v1.

The logical vector output has the same length as the input vector v1 (four in this example).

If we swap the vector objects, we get a two element logical vector since we are now comparing each element in v2 to any matching elements in v1.

v2 %in% v1
[1]  TRUE FALSE

7.5 Checking if a value is NA

When assessing if a value is equal to NA the following evaluation may behave unexpectedly.

a <- c (3, 67, 4, NA, 10)
a == NA
[1] NA NA NA NA NA

The output is not a logical data type we would expect from an evaluation. Instead, you must make use of the is.na() function:

is.na(a)
[1] FALSE FALSE FALSE  TRUE FALSE

As another example, if we want to keep all rows in dataframe d where z = NA, we would type:

d <- data.frame(x = c(1,4,2,5,2,3,NA), 
                y = c(3,2,5,3,8,1,1), 
                z = c(NA,NA,4,9,7,8,3))

d[ is.na(d$z), ]
  x y  z
1 1 3 NA
2 4 2 NA

You can, of course, use the ! operator to reverse the evaluation and omit all rows where z = NA,

d[ !is.na(d$z), ]
   x y z
3  2 5 4
4  5 3 9
5  2 8 7
6  3 1 8
7 NA 1 3