5  Working with Dates

lubridate stringr
1.9.3 1.5.1

Date values can be represented in tables as numbers or characters. But to be properly interpreted by R as dates, date values should be converted to an R date object class or a POSIXct/POSIXt object class. R provides many facilities to convert and manipulate dates and times, but a package called lubridate makes working with dates/times much easier.

 

5.1 Creating date/time objects

  

5.1.1 From complete date strings

You can convert many representations of date and time to date objects. For example, let’s create a vector of dates represented as month/day/year character strings,

x <- c("06/23/2013", "06/30/2013", "07/12/2014")
class(x)
[1] "character"

At this point, R treats the vector x as characters. To force R to interpret these as dates, use lubridate’s mdy function. mdy will convert date strings where the date elements are ordered as month, day and year.

library(lubridate)
x.date <- mdy(x)
class(x.date)
[1] "Date"

Note that using the mode or typeof functions will not help us determine if the object is an R date object. This is because a date is stored as a numeric (double) internally. Use the class function instead as shown in the above code chunk.

If you need to specify the time zone, add the parameter tz=. For example, to specify Eastern Standard Time, type:

x.date <- mdy(x, tz="EST")
x.date
[1] "2013-06-23 EST" "2013-06-30 EST" "2014-07-12 EST"

The mdy function can read in date formats that use different delimiters so that mdy("06/23/2013"), mdy("06-23-2013") and mdy("06.23.2013") are parsed exactly the same so long as the order remains month/day/year.

For different month/day/year arrangements, other lubridate functions need to be used:

Functions Date Format
dmy() day/month/year
ymd() year/month/day
ydm() year/day/month

If your data contains both date and time in a “month/day/year hour:minutes:seconds” format use the mdy_hms function.

x <- c("06/23/2013 03:45:23", "07/30/2013 14:23:00", "08/12/2014 18:01:59")
x.date <- mdy_hms(x, tz="EST")
x.date
[1] "2013-06-23 03:45:23 EST" "2013-07-30 14:23:00 EST" "2014-08-12 18:01:59 EST"

The characters _h, _hm or _hms can be appended to any of the four date function names described earlier to accommodate time formats. A few examples follow:

mdy_h("6/23/2013 3", tz="EST") 
[1] "2013-06-23 03:00:00 EST"
dmy_hm("23/6/2013 3:15", tz="EST")
[1] "2013-06-23 03:15:00 EST"
ymd_hms("2013/06/23 3:15:7", tz="EST")
[1] "2013-06-23 03:15:07 EST"

Note that adding a time element to the date object will create POSIXct and POSIXt object classes instead of Date object classes.

class(x.date)
[1] "POSIXct" "POSIXt" 

Also, if a timezone is not explicitly defined for a time based date, the function assigns UTC ( Universal Coordinated Time).

dmy_hm("23/6/2013 3:15")
[1] "2013-06-23 03:15:00 UTC"

5.1.2 Setting and modifying timezones

R does not maintain its own list of timezone names, instead, it relies on the operating system’s naming convention. To list the supported timezone names for your particular R environment, type:

OlsonNames()

For example, to select Daylight Savings Time type tz = "EST5EDT".

x.date <- mdy_hms(x, tz="EST5EDT")
x.date
[1] "2013-06-23 03:45:23 EDT" "2013-07-30 14:23:00 EDT" "2014-08-12 18:01:59 EDT"
class(x.date)
[1] "POSIXct" "POSIXt" 

If you need to convert the day/time to another timezone, use lubridate’s with_tz() function. For example, to convert x.date from it’s current EST5DST timezone to the US/Alaska time zone, type:

with_tz(x.date, tzone = "US/Alaska") 
[1] "2013-06-22 23:45:23 AKDT" "2013-07-30 10:23:00 AKDT" "2014-08-12 14:01:59 AKDT"

Note that the with_tz function will change the timestamp to reflect the new time zone. If you simply want to change the time zone definition and not the timestamp, use the tz() function.

tz(x.date) <- "US/Alaska"
x.date
[1] "2013-06-23 03:45:23 AKDT" "2013-07-30 14:23:00 AKDT" "2014-08-12 18:01:59 AKDT"

5.1.3 From separate date elements

If your data table splits the date elements into separate vector objects or columns, use the paste function to combine the elements into a single date string before passing it to one of the lubridate functions. Let’s look at an example:

dat1 <- read.csv("http://mgimond.github.io/ES218/Data/CO2.csv")
head(dat1)
  Year Month Average Interpolated  Trend Daily_mean
1 1959     1  315.62       315.62 315.70         -1
2 1959     2  316.38       316.38 315.88         -1
3 1959     3  316.71       316.71 315.62         -1
4 1959     4  317.72       317.72 315.56         -1
5 1959     5  318.29       318.29 315.50         -1
6 1959     6  318.15       318.15 315.92         -1

The CO2 dataset has the date split across two columns: Year and Month (both stored as integers). You can combine the columns into a character string using the paste function. For example, if we want to create a “Year-Month” string as in 1959-10, we could type:

paste(dat1$Year,dat1$Month, sep="-")

The above example uses three arguments: the two objects that are pasted together (i.e. Year and Month) and the sep="-" parameter which fills the gap between both objects with a dash - (by default, paste would have added spaces thus creating strings in the form of 1959 10).

lubridate does not have a function along the lines of ym to convert just the year-month strings, this requires that we add an artificial day of the month to the string. We’ll choose to add the 15th day of the month as in

paste(dat1$Year, dat1$Month, "15", sep="-")

We are now ready to add a new column called Date to the dat object and fill that column with a real date object:

dat1$Date <- ymd( paste(dat1$Year, dat1$Month, "15", sep="-") )
head(dat1)
  Year Month Average Interpolated  Trend Daily_mean       Date
1 1959     1  315.62       315.62 315.70         -1 1959-01-15
2 1959     2  316.38       316.38 315.88         -1 1959-02-15
3 1959     3  316.71       316.71 315.62         -1 1959-03-15
4 1959     4  317.72       317.72 315.56         -1 1959-04-15
5 1959     5  318.29       318.29 315.50         -1 1959-05-15
6 1959     6  318.15       318.15 315.92         -1 1959-06-15

The sep="-" option is not needed with the lubridate function since lubridate will recognize a space as a valid delimiter, so the last piece of code could have been written as:

dat1$Date <- ymd( paste(dat1$Year, dat1$Month, "15") )

To confirm that the Date column is indeed formatted as a date object type:

str(dat1)
'data.frame':   721 obs. of  7 variables:
 $ Year        : int  1959 1959 1959 1959 1959 1959 1959 1959 1959 1959 ...
 $ Month       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Average     : num  316 316 317 318 318 ...
 $ Interpolated: num  316 316 317 318 318 ...
 $ Trend       : num  316 316 316 316 316 ...
 $ Daily_mean  : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Date        : Date, format: "1959-01-15" "1959-02-15" "1959-03-15" "1959-04-15" ...

or you could type,

class(dat1$Date)
[1] "Date"

Since we did not add a timezone or a time component to the date object the Date column was assigned a Date class as opposed to the POSIX... class.

5.1.4 Padding time values

The lubridate functions may expect the time values to consist of a specific number of characters if a delimiter such as : is not present to split the time elements. For example, the following will not generate a valide date/time object:

hrmin <- 712           # Time 7:12
date  <- "2018/03/17"  # Date 
ymd_hm(paste(date, hrmin))
[1] NA

One solution is to pad the time element with 0’s to complete a four character vector (or a six character vector if seconds are part of the time element). We can use the str_pad function from the stringr package to pad the time object (the stringr package is covered in chapter 6).

library(stringr)
ymd_hm(paste(date, str_pad(hrmin, width=4, pad="0")))
[1] "2018-03-17 07:12:00 UTC"

5.2 Extracting date information

If you want to extract the day of the week from a date vector, use the wday function.

wday(x.date) 
[1] 1 3 3

If you want the day of the week displayed as its three letter designation, add the label=TRUE parameter.

wday(x.date, label=TRUE) 
[1] Sun Tue Tue
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

You’ll note that the function returns a factor object with seven levels–one for each day of the week (Sun, Mon, Tue, Wed, Thu, Fri, Sat)–as well as the level hierarchy which will dictate the order in which values will be displayed if grouped by this factor. The levels are not necessarily reflected in the vector elements (only Sun, Tue are present), but the levels are there if we were ever to add additional day elements to this vector.

The following table lists lubridate functions that extract different elements of a date object.

Functions Extracted element
hour() Hour of the day
minute() Minute of the hour
day() Day of the month
yday() Day of the year
decimal_date() Decimal year
month() Month of the year
year() Year
tz() Time zone

The month() and wday() have a label option that will output the values as texts and not numbers. For example:

month(x.date, label=TRUE)
[1] Jun Jul Aug
Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec

Note that the names are stored as factors. This may prove useful in that the names will be sorted in chronological order in the factor’s level.

To get the full name, set the abbreviation parameter, abbr, to FALSE.

month(x.date, label=TRUE, abbr = FALSE)
[1] June   July   August
12 Levels: January < February < March < April < May < June < July < August < September < October < ... < December

5.3 Operating on dates

You can apply certain operations to dates as you would to numbers. For example, to list the number of days between the first and third elements of the vector x.date type the following:

(x.date[3] - x.date[1]) / ddays()
[1] 415.5949

To get the number of weeks between both dates:

(x.date[3] - x.date[1]) / dweeks()
[1] 59.37069

Likewise, you can get the number of minutes between dates by dividing by dminutes() and the number of years by dividing by dyears().

You can also apply Boolean operations on dates. For example, to find which date element in x.date falls between the 11th and 24th day of any month, type:

(mday(x.date) > 11) & (mday(x.date) < 24)
[1]  TRUE FALSE  TRUE

If you want the command to return just the dates that satisfy this query, pass the Boolean operation as an index to the x.date vector:

x.date[ (mday(x.date) > 11) & (mday(x.date) < 24) ]
[1] "2013-06-23 03:45:23 AKDT" "2014-08-12 18:01:59 AKDT"

5.4 Formatting date objects

You can create a character vector from a date object. This is useful if you want to annotate plots with dates or include date values in reports. For example, to convert the date object x.date to a “ Year” character format, type the following:

format(x.date, "%B %Y")
[1] "June 2013"   "July 2013"   "August 2014"

The format function accepts many different date/time format codes listed in the following table (note the cases!).

Format codes Description Example
%a Abbreviated weekday name 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%A Full weekday name 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%m Month as decimal number 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%b Abbreviated month name 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%B Full month name 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%c Date and time, locale-specific 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%d Day of the month as decimal number 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%H Hours as decimal number (00 to 23) 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%I Hours as decimal number (01 to 12) 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%p AM/PM indicator in the locale 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%j Day of year as decimal number 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%M Minute as decimal number (00 to 59) 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%S Second as decimal number 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%U Week of the year starting on the first Sunday 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%W Week of the year starting on the first Monday 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%w Weekday as decimal number (Sunday = 0) 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%x Date (locale-specific) 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%X Time (locale-specific) 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%Y 4-digit year 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%y 2-digit year 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%Z Abbreviated time zone 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59
%z Time zone 2013-06-23 03:45:23, 2013-07-30 14:23:00, 2014-08-12 18:01:59