version | |
---|---|
R | 4.3.2 |
7 Relational and boolean operations
You’ve already been exposed to a few examples of relational and boolean operations in earlier chapters. A formal exploration of these techniques follow.
7.1 Relational operations
Relational operations play an important role in data manipulation. Anytime you subset a dataset based on one or more criterion, you are making use of a relational operation. The relational operators (also known as logical binary operators) include ==
, !=
, <
, <=
, >
and >=
. The output of a condition is a logical vector TRUE
or FALSE
.
Relational operator | Syntax | Example |
---|---|---|
Exact equality | == |
3 == 4 -> FALSE |
Exact inequality | != |
3 != 4 -> TRUE |
Less than | < |
3 < 4 -> TRUE |
Less than or equal | <= |
4 <= 4 -> TRUE |
Greater than | > |
3 > 4 -> FALSE |
Greater than or equal | >= |
4 >= 4 -> TRUE |
7.2 Boolean operations
Boolean operations can be used to piece together multiple evaluations.
R has three boolean operators: The AND operator, &
; The NOT operator, !
; And the OR operator, |
.
The &
operator requires that the conditions on both sides of the boolean operator be satisfied. You would normally use this operator when addressing a condition along the lines of “x
must be satisfied AND y
must be satisfied”.
The |
operator requires that at least one condition be met on either side of the boolean operator. You would normally use this operator when addressing a condition along the lines of “x
must be satisfied OR y
must be satisfied”. Note that the output will also be TRUE if both conditions are met.
The !
operator is a negation operator. It will reverse the outcome of a condition. So if the outcome of an expression is TRUE
, preceding that expression with !
will reverse the outcome to FALSE
and vice-versa.
Boolean operator | Syntax | Example | Outcome |
---|---|---|---|
AND | & |
4 == 3 4 == 4 |
FALSE TRUE |
OR | | |
4 == 4 4 == 3 4 == 3 |
TRUE TRUE FALSE |
NOT | ! |
|
TRUE FALSE |
The following table breaks down all possible Boolean outcomes where T
= TRUE
and F
= FALSE
:
Boolean operation | Outcome |
---|---|
T & T |
TRUE |
T & F |
FALSE |
F & F |
FALSE |
T | T |
TRUE |
T | F |
TRUE |
F | F |
FALSE |
! T |
FALSE |
! F |
TRUE |
If the input values to a boolean operation are numeric vectors and not logical vectors, the numeric values will be interpreted as FALSE
if zero and TRUE
otherwise. For example:
1 & 2
[1] TRUE
1 & 0
[1] FALSE
7.2.1 Pecking order in operations
Note that the operation a == (3 | 4)
is not the same as (a == 3) | (a == 4)
. If, for example, a = 3
, the former will return FALSE
whereas the latter will return TRUE
.
<- 3
a == (3 | 4) a
[1] FALSE
== 3) | (a == 4) (a
[1] TRUE
This is because R applies a pecking order to its operations. In the former case, R is first evaluating what is in between the parentheses, (3 | 4)
.
3 | 4) (
[1] TRUE
This returns TRUE
since the numbers on either side of |
are converted to TRUE
(only values of 0
are converted to FALSE
). It then compares a
to this logical vector TRUE
.
== TRUE a
[1] FALSE
Here, the ==
operator requires that both sides of the operation be of the same data type. a
is numeric and TRUE
is logical. Recall from Chapter 3 that R circumvents differences in data types by coercing all values to the highest common mode (see the chapter on data types). Here, numeric
overrides logical
type thus coercing the TRUE
variable to its numeric
data type representation, 1
. Hence, the evaluation being performed is:
== 1 a
[1] FALSE
When a vector is evaluated for more than one condition, you need to explicitly break down each condition before combining them with boolean operators.
== 3) | (a == 4) (a
[1] TRUE
The above is an example of R’s built-in operation precedence rules. For example, comparison operations such as <=
and >
are performed before boolean operations such that a == 3 | 4
will first evaluate a == 3
before evaluating ... | 4
.
Even boolean operations follow a pecking order such that !
precedes &
which precedes |
. For example:
! TRUE & FALSE | TRUE
will first evaluate ! TRUE
, then ... & FALSE
, then ... | TRUE
.
To overrride R’s built-in precedence, use parentheses. For example:
! TRUE & (FALSE | TRUE)`
will first evaluate (FALSE | TRUE)
and ! TRUE
separately, then their output will be combined with ... & ...
.
For a full list of operation precedence, access the help page for Syntax
.
?Syntax
The following lists the pecking order from high to low precedence (i.e. top operation is performed before bottom operation).
:: ::: | access variables in a namespace |
$ @ | component / slot extraction |
[ [[ | indexing |
^ | exponentiation (right to left) |
- + | unary minus and plus |
: | sequence operator |
%any% |> | specialoperators (including %% and %/%) |
* / | multiply, divide |
+ - | (binary) add, subtract |
< > <= >= == != | ordering and comparison |
! | negation |
& && | and |
| || | or |
~ | as in formulae |
-> ->> | rightwards assignment |
<- <<- | assignment (right to left) |
= | assignment (right to left) |
? | help |
7.3 Comparing multidimensional objects
The relational operators are used to compare single elements (i.e. one element at a time). If you want to compare two objects as a whole (e.g. multi-element vectors or data frames), use the identical()
function. For example:
<- c(1, 5, 6, 10)
a <- c(1, 5, 6)
b identical(a, a)
[1] TRUE
identical(a, b)
[1] FALSE
identical(mtcars, mtcars)
[1] TRUE
Notice that identical
returns a single logical vector, regardless the input object’s dimensions.
Note that the data structure must match as well as its element values. For example, if d
is a list and a
is an atomic vector, the output of identical
will be false even if the internal values match.
<- list( c(1, 5, 6, 10) )
d identical(a, d)
[1] FALSE
If we convert d
from a list to an atomic vector using the unlist
function (thus matching data structures), we get:
identical(a, unlist(d))
[1] TRUE
7.4 The match operator %in%
The match operator %in%
compares two sets of vectors and assesses if an element on the left-hand side of %in%
matches any of the elements on the right-hand side of the operator. For each element in the left-hand vector, R returns TRUE
if the value is present in any of the right-hand side elements or FALSE
if not.
For example, given the following vectors:
<- c( "a", "b", "cd", "fe")
v1 <- c( "b", "e") v2
find the elements in v1
that match any of the values in v2
.
%in% v2 v1
[1] FALSE TRUE FALSE FALSE
The function checks whether each element in v1
has a matching value in v2
. For example, element "a"
in v1
is compared to elements "b"
and "e"
in v2
. No matches are found and a FALSE
is returned. The next element in v1
, "b"
, is compared to both elements in v2
. This time, there is a match (v2
has an element "b"
) and TRUE
is returned. This process is repeated for all elements in v1
.
The logical vector output has the same length as the input vector v1
(four in this example).
If we swap the vector objects, we get a two element logical vector since we are now comparing each element in v2
to any matching elements in v1
.
%in% v1 v2
[1] TRUE FALSE
7.5 Checking if a value is NA
When assessing if a value is equal to NA
the following evaluation may behave unexpectedly.
<- c (3, 67, 4, NA, 10)
a == NA a
[1] NA NA NA NA NA
The output is not a logical data type we would expect from an evaluation. Instead, you must make use of the is.na()
function:
is.na(a)
[1] FALSE FALSE FALSE TRUE FALSE
As another example, if we want to keep all rows in dataframe d
where z
= NA
, we would type:
<- data.frame(x = c(1,4,2,5,2,3,NA),
d y = c(3,2,5,3,8,1,1),
z = c(NA,NA,4,9,7,8,3))
is.na(d$z), ] d[
x y z
1 1 3 NA
2 4 2 NA
You can, of course, use the !
operator to reverse the evaluation and omit all rows where z
= NA
,
!is.na(d$z), ] d[
x y z
3 2 5 4
4 5 3 9
5 2 8 7
6 3 1 8
7 NA 1 3