Make sure to download and install the software, packages and data onto your computer as outlined in the following section prior to attending the workshop.


Software, packages and data (to install prior to workshop)

R

Version 3.4.2 is the latest version and the one used in this workshop. R is an open source data analysis and visualization programming environment whose roots go back to the S programming language developed at Bell Laboratories in the 1970’s by John Chambers. It’s available for most operating systems including Windows, Mac and Linux.

You can download the latest version of R for Windows from https://cran.r-project.org/bin/windows/base/ and for the latest version of R for Mac got to https://cran.r-project.org/bin/macosx/

RStudio

RStudio is an integrated development environment (IDE) for R. It offers a user friendly interface to R by including features such as a source code editor (with colored syntax), data table viewer, git and GitHub integration and markdown output. Note that RStudio is not needed to run R (which has its own IDE environment–albeit not as nice as RStudio’s) but it makes using R far easier. RStudio is free software, but unlike R, it’s maintained by a private entity which also distributes a commercial version of RStudio for businesses or individuals needing customer support.

You can download the free RStudio desktop from this link https://www.rstudio.com/products/rstudio/download3/#download

R packages

R packages are installed in the user’s home directory (C:/Users/…) by default. This is advantageous in that you do not need to have administrative privileges to install any package. But it can be a disadvantage in that if someone else logs on to the same computer where you installed a package, that person will not have access to it requiring that she install that same package in her home directory thereby duplicating an instance of that package on the same computer.

The following CRAN packages will be used in this workshop: raster, tmap, rastervis, gstat and rgdal. There are two ways you can install R packages from the CRAN repository: via the command line or via the RStudio interface.

Package installation option 1

For the command line approach simply run the following lines of code in an R console:

install.packages("raster")
install.packages("tmap")
install.packages("rastervis")
install.packages("gstat")
install.packages("rgdal")

Note that most of these packages have dependencies, but these dependencies are automatically loaded for you.

Package installation option 2

Packages can also be installed via the RStudio interface. To install a CRAN package from within RStudio, click on the Packages tab, select Install and choose Repository (CRAN, CRANextra) as the source location. In the following example, the package raster is installed from CRAN.

knitr::include_graphics("Figures/Install_CRAN_packages.png")

Data

Data used in this workshop can be downloaded from https://mgimond.github.io/megug2017/data.zip then extracted into a folder for use in the workshop.


R and RStudio basics

Command line vs. script file

Command line

R can be run from a R console or a RStudio command line environment. For example, we can assign four numbers to the object x then have R read out the values stored in x by typing the following at a command line:

x <- c(1,2,3,4)
x
[1] 1 2 3 4

<- is referred to as the assignment operator. Operations and functions to the right of the operator are stored in the object to the left.

R script file

If you intend on typing more than a few lines of code in a command prompt environment, or if you wish to save a series of commands as part of a project’s analysis, it is probably best that you type your commands in an a R script file. Such file is usually saved with a .R extension.

You create a new script by clicking on the upper-left icon and selecting R Script.

In RStudio, you can run (or execute in programming lingo) a line of code of an R script by placing a cursor anywhere on that line (while being careful not to highlight any subset of that line) and pressing the shortcut keys Ctrl+Enter (or Command+Enter on a Mac).

You can also run an entire block of code by selecting (highlighting) all lines to be run and pressing the shortcut keys Ctrl+Enter. Or, you can run the entire R script by pressing Ctrl+Alt+R.

In the following example, the R script file has three lines of code: two assignment operations and one regression analysis. The lines are run one at a time using the Ctrl+Enter keys and the output of each line is displayed in the console window.

Packages

One of R’s attractive features is its rich collection of packages designed for specific applications and analysis techniques. Packages allow researchers and scientists to share R functions and data with other users. Some packages come already installed with R, others must be downloaded separately from a CRAN repository or other locations such as GitHub or a personal website.

Base packages

R comes installed with a set of default packages. A snapshot of a subset of the installed base packages is shown below:

Using a package in a R session

Just because a package is installed on your computer (in your home directory or in a directory accessible to you) does not mean that you have access to its functions in an R session. For example, after installing the tmap package you might want to access one of its datasets (World) and make use of one of its functions (qtm) to generate a choropleth map. This requires that tmap’s function be loaded at the beginning of the session using the library() function. Here’s what happens when you don’t load the tmap package before using its World dataset and running its qtm() function:

data(World) 
qtm(World)

To make the functions and/or data of a package available in an R session, use the library() function:

library(tmap)
data(World)
qtm(World)

Getting a session’s info

Reproducibility is a fundamental idea behind an open source analysis environment such as R. So it’s only fitting that all aspects of your analysis environment be made available (along with your data and analysis results). This is because functions and programming environments may change in their behavior as versions evolve; this may be by design or the result of a bug in the code fixed in later versions. No pieces of software, open-source or commercial, are immune to this. It’s therefore important that you publicize the R session in which parts of or all analyses were conducted. A simple way to do this is to call the sessionInfo() function.

sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rgdal_1.2-11 raster_2.5-8 sp_1.2-5     tmap_1.11   

loaded via a namespace (and not attached):
 [1] viridisLite_0.2.0  jsonlite_1.5       splines_3.4.2      geojsonlint_0.2.0 
 [5] foreach_1.4.3      R.utils_2.5.0      gtools_3.5.0       shiny_1.0.5       
 [9] expm_0.999-2       stats4_3.4.2       yaml_2.1.14        LearnBayes_2.15   
[13] backports_1.1.1    lattice_0.20-35    digest_0.6.12      RColorBrewer_1.1-2
[17] colorspace_1.3-2   htmltools_0.3.6    httpuv_1.3.5       Matrix_1.2-11     
[21] R.oo_1.21.0        plyr_1.8.4         XML_3.98-1.9       rmapshaper_0.3.0  
[25] gmodels_2.16.2     xtable_1.8-2       webshot_0.4.1      scales_0.5.0      
[29] gdata_2.18.0       satellite_1.0.0    gdalUtils_2.0.1.7  mapview_2.1.4     
[33] magrittr_1.5       mime_0.5           deldir_0.1-14      evaluate_0.10.1   
[37] R.methodsS3_1.7.1  nlme_3.1-131       MASS_7.3-47        class_7.3-14      
[41] tools_3.4.2        geosphere_1.5-5    stringr_1.2.0      V8_1.5            
[45] munsell_0.4.3      compiler_3.4.2     e1071_1.6-8        units_0.4-6       
[49] classInt_0.1-24    grid_3.4.2         tmaptools_1.2-1    RCurl_1.95-4.8    
[53] dichromat_2.0-0    iterators_1.0.8    htmlwidgets_0.9    crosstalk_1.0.0   
[57] bitops_1.0-6       base64enc_0.1-3    rmarkdown_1.6      boot_1.3-20       
[61] codetools_0.2-15   DBI_0.7            jsonvalidate_1.0.0 curl_2.8.1        
[65] R6_2.2.2           knitr_1.17         udunits2_0.13      rgeos_0.3-25      
[69] rprojroot_1.2      spdep_0.6-15       KernSmooth_2.23-15 stringi_1.1.5     
[73] osmar_1.1-7        Rcpp_0.12.12       sf_0.5-4           png_0.1-7         
[77] leaflet_1.1.0      coda_0.19-1       

Output includes all loaded base packages and external packages (e.g. tmap in this working example) as well as their version.

Defining the working directory

The following exercises assume that your data files are stored in the session’s working directory (this is where R will look for the GIS files).

You change an R session’s working directory via the following pull-down:

Raster manipulation basics

Loading a raster file

We’ll use the raster package to read a geoTIFF file. The raster package will be used throughout the workshop.

library(raster)
elev <- raster("Elev.tif")

Extracting raster properties and data

elev
class       : RasterLayer 
dimensions  : 2514, 2514, 6320196  (nrow, ncol, ncell)
resolution  : 2, 2  (x, y)
extent      : 509986, 515014, 4959986, 4965014  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=19 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
data source : \\filer.colby.edu\Personal\mgimond\Presentation\GIS_Educators_2017\Elev.tif 
names       : Elev 

The summary indicates that elev is a 2514 row by 2514 column raster with a total of 6,320,196 cell. The cell’s dimensions are 2 m by 2 m (the units are identified from the coordinate reference information in line 5 of the summary output).

You can access a raster’s content via the indexing system. You can either reference the cell number (enumerated starting at the upper left-hand corner of the raster working left to right, top to bottom).

elev[2614]
         
75.64097 

You can also reference a cell via the row and column numbers.

elev[ 2, 100]
         
75.64097 

Or you can reference a cell by it coordinate value (in the raster’s coordinate system).

cell <- cellFromXY(elev, c(513500, 4962000)) # Find index number
elev[cell] 
         
52.76538 

You can access all the cell values by leaving the index empty. This is useful if you want to run summary statistics on all the cell values. Note that the following command may take a while to run.

hist(elev[], main=NULL)

Managing memory

Raster files tend to be big and can therefore take up a lot of RAM. The raster package does not usually load the entire file into memory, this can be a good thing if your RAM cannot handle the file size. But if RAM is not an issue, you might be better off forcing the entire raster into RAM to speed up raster processing.

To figure out if elev is loaded into memory, type:

inMemory(elev)
[1] FALSE

To force the raster into memory, type:

elev.mem <- readAll(elev)
inMemory(elev.mem)
[1] TRUE

The difference in processing time can be significant. For example, the difference in processing time when computing the median pixel values is about five times. The following script makes use of the microbenchmark package to time processes.

microbenchmark::microbenchmark(
  median(elev[]),
  median(elev.mem[]),
  times = 20   # Number of times to run each funtion--average time is returned
)
Unit: milliseconds
               expr       min       lq     mean   median       uq      max neval
     median(elev[]) 619.85553 627.3215 714.8203 632.6234 795.7960 904.4759    20
 median(elev.mem[])  92.62083 102.8791 147.9735 104.3471 194.3925 289.1020    20

The output shows summary statistics for the runtime (i.e. the functions were run 20 times to generate a more representative runtime).

Cropping a raster using a vector

We’ll use the rgdal package to read the clip shapefile. The first argument is the folder location (relative to the session’s working directory) and the second argument is the name of the shapefile (without the .shp extension).

library(rgdal)
shp   <- readOGR(".","Clip")
OGR data source with driver: ESRI Shapefile 
Source: ".", layer: "Clip"
with 3 features
It has 1 fields
elev2 <- crop(elev.mem, shp)

Cropping a raster interactively

You can also crop a raster manually by drawing either a rectangle or a polygon directly on the plot.

plot(elev)
elev2 <- select(elev, use="rec") # To select by rectangle
elev2 <- select(elev, use="pol") # To select by polygon

Note that you will not see the outlines of the rectangle or polygon until you are done defining the crop region. If defining by polygon, press the esc key to terminate drawing.

Plotting a raster

For quick plotting, you can use the built-in plot function.

plot(elev2)

For greater control in plotting the raster, you might want to use the tmap package.

library(tmap)
tm_shape(elev2) + tm_raster(style= "sd", n = 10 , palette = "Greys")

The style parameter controls the classification method (standard deviation is chosen here). Other options include pretty, jenks and kmeans just to name a few.

If you want to adopt a continuous color scheme, set style to cont. We’ll also move the legend box off to the side.

tm_shape(elev2) + tm_raster(style= "cont", palette = "Greys")  + tm_legend(outside = TRUE)

You can also overlay vector layers.

tm_shape(elev2) + tm_raster(style= "cont", palette = "Greys") +
  tm_shape(shp) + tm_borders() + tm_legend(outside = TRUE)

Decreasing pixel resolution

To decrease a raster’s resolution, use the aggregate function to resample the raster. Here, we’ll resample by a factor of 100 (i.e. 100x100 pixels will be aggregated to a single cell) and use the arithmetic mean to compute the cell’s output value.

elev3a <- aggregate(elev2, fact=100, fun=mean, expand=TRUE)
tm_shape(elev3a) + tm_raster(style= "cont") + tm_legend(outside = TRUE)

Increasing pixel resolution

To increase a raster’s resolution, we’ll use the disaggregate function. Here we’ll double the pixel resolution (fact=2).

elev3b <- disaggregate(elev2, fact=2)
tm_shape(elev3b) + tm_raster(style= "cont") + tm_legend(outside = TRUE)

If you want to interpolate the output cells, you can add method = "bilinear" as a parameter to the function.

Coordinate systems

Coordinate systems are defined using Proj4 syntax. You can also define projections using EPSG’s coding system or ESRI’s coding system.

You can identify the current coordinate system as follows:

proj4string(elev2)
[1] "+proj=utm +zone=19 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"

If the coordinate system is not defined, or if it is improperly defined, you can pass the coordinate definition as follows:

proj4string(elev2) <- CRS("+proj=utm +zone=19 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs")

To change the projection to an azimuthal equidistant conic projection centered on Bangor we’ll first define the projection:

azed   <- CRS("+proj=aeqd +lat_0=44.8 +lon_0=-68.81 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")

Next we’ll reproject the data:

elev2_aeqd <- projectRaster(elev2, crs=azed) 

If you want to use an ESRI defined projection code such as an azimuthal equidistant conic projection centered on the North Pole (ESRI projection code 54032),

esri1   <- CRS("+init=esri:54032")

EPSG offers a wide range of coordinate systems recognized by many industries, a few popular ones follow:

# epsg authority
merc   <- CRS("+init=epsg:3395")  # Mercator projection
gcs    <- CRS("+init=epsg:4326")  # Geographic (lat/long)
utm19  <- CRS("+init=epsg:32619") # UTM 19N WGS84

Map Algebra

Local operations

Unary operations

Here, we’ll convert the elevation units to feet.

elev_ft <- elev2 * 3.2808
tm_shape(elev2) + tm_raster(title="Elevation\n(feet)") + tm_legend(outside = TRUE)

Binary operation

Next, we’ll use conditional statements to create two intermediate rasters, then combine them using the Boolean operator & to identify all pixels whose elevation value is greater than 50 and less than 60.

bin1 <- elev2 > 50 & elev2 < 60
tm_shape(bin1) + tm_raster() + tm_legend(outside = TRUE)

Focal Operations

Focal operations require that a kernel be defined. The kernel can be defined via a weighted matrix (as used in the following example) or a function. The kernel definition is passed to the w parameter in the focal function.

elev2f <- focal(elev2, w=matrix(1/10201,nrow=101,ncol=101))
tm_shape(elev2f) + tm_raster() + tm_legend(outside = TRUE)

Specialized focal stats functions: slope, aspect and hillshade

In this example, we’ll compute a slope using the terrain function. Output can be in degrees or radians (we’ll choose the latter since such units are needed to create a hillshade in a later step).

slope <- terrain(elev2, opt="slope", unit="radians", neighbors=8)
tm_shape(slope) + tm_raster() + tm_legend(outside = TRUE)

Next, we’ll compute the aspect.

aspect <- terrain(elev2, opt="aspect", unit="radians", neighbors=8)
tm_shape(aspect) + tm_raster() + tm_legend(outside = TRUE)

We can now combine the aspect and slope rasters to generate a Hillshade.

shade <- hillShade(slope, aspect)
tm_shape(shade) +tm_raster(style="cont", palette = "Greys") +tm_legend(show=FALSE)

Zonal operations

Here, we’ll use the zonal function to summarize pixel values by zone. The zonal layer must be a raster so we’ll first rasterize the shp vector object. Since the zonal function will expect the zonal layer to share the same extent as the (to be summarised) raster we’ll use elev2 to define shp’s raster output extent and pixel resolution. We also need to specify the attribute whose value will be assigned to the output pixels (field = "Id", note that R is case sensitive).

shp2rast <- rasterize(shp, elev2, field = "Id")

Now let’s compute the zonal stats and output the table.

zone_stat <- zonal(elev2, shp2rast, fun=mean)
zone_stat
     zone    value
[1,]    1 53.09672
[2,]    2 49.89339
[3,]    3 59.05706

Global operations

An example of a global operation is the calculation of a euclidean and geodesic distance. For this exercise, we’ll load a point layer, rasterize it, then generate a distance raster from that point. If you have not loaded the rgdal package in an earlier step, do so before running the following script which will load the point shapefile.

pt <- readOGR(".", "Conference")
OGR data source with driver: ESRI Shapefile 
Source: ".", layer: "Conference"
with 1 features
It has 1 fields

Next, we’ll rasterize the point to match elev’s pixel resolution and extent.

pt.ras <- rasterize(pt, elev2, field="Id")

Now, compute the euclidean distance. Note that if the data is projected in a planar coordinate system, the output will be a Euclidean (planar) distance raster in the projection’s map units.

dist1 <- distance(pt.ras) 
tm_shape(dist1) + tm_raster(style="cont") +
  tm_shape(pt) + tm_dots(col="red", size=1.1)

If the input coordinate system is geographic (lat/long) then the output will be a geodesic distance raster.

For example, let’s first project the point to a GCS. Since the point is a vector, we’ll use another projection function called spTransform.

gcs    <- CRS("+init=epsg:4326")  # Geographic (lat/long)
pt.gcs <- spTransform(pt, gcs)

We’ll then rasterize the point by manually defining its global extent.

r.globe <- raster(nrows=180, ncols=360, crs=gcs)
pt.ras.gcs <- rasterize(pt.gcs, r.globe, field="Id")

Now compute the geodesic distance.

dist2 <- distance(pt.ras.gcs)

Note that the geodesic distance units are in meters.

Let’s map the raster and add a World overlay. Note that even though the raster is in a GCS, the map will default to a cylindrical coordinate system.

data(World) # Dataset built into tmap
tm_shape(dist2) + tm_raster() +
  tm_shape(World) +tm_borders() +
  tm_shape(pt) + tm_dots(col="red", size=1.1)

We can view the layers in an azimuthal equidistant coordinate system centered on Bangor to check that the distances are indeed geodesic.

azed   <- CRS("+proj=aeqd +lat_0=44.8 +lon_0=-68.81 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
tm_shape(dist2, projection=azed) + tm_raster() +
  tm_shape(World) + tm_borders() +
  tm_shape(pt) + tm_dots(col="red", size=1.1)

Multi-band rasters

Load individual bands:

b2  <- raster("Band2.tif")
b3  <- raster("Band3.tif")
b4  <- raster("Band4.tif")
b5  <- raster("Band5.tif")

You can create two types of multiband raster: a RasterStack or a RasterBrick.

r.stack <- stack(b2,b3,b4,b5)
r.brick <- brick(b2,b3,b4,b5)

They differ in how they are stored in your current R session. A RasterBrick stores the entire data into memory while a RasterStack does not. This usually make the RasterBrick more efficient to work with.

object.size(r.stack)
53680 bytes
object.size(r.brick)
8996016 bytes

You can subset a multiband raster using the subset function. For example, to subset bands 2 through 4:

r.brick.sub <- subset(r.brick, c(2,3,4))

To grab the spectrum for a pixel, simply specify the row and column numbers (e.g. row 2 and column 100).

r.brick[2, 100]
     Band2 Band3 Band4 Band5
[1,]  8620  8482  6778 23196

Note that the output is a matrix. If you want to generate a line plot, you will need to convert the matrix to a vector.

plot(as.vector(r.brick[2,100]), type="b")

If you want to extract cell values from an xy coordinate value (using the raster’s coordinate system), you first need to identify the cell index using the cellFromXY function, then pass that index to the raster object.

cell <- cellFromXY(r.brick, c(526700, 4961400))
r.brick[cell]
     Band2 Band3 Band4 Band5
[1,]  9454  9388  8545 18214

You can display three bands from a multiband raster using a composite RGB plot.

plotRGB(r.brick, r=4,g=3,b=2, stretch="lin", axes=TRUE)

Writing raster files

You can write rasters to different file formats:

library(rgdal)

# GeoTIFF format
writeRaster(r.brick, filename="Bangor.tif", format='GTiff', overwrite=TRUE)

# Native raster format
writeRaster(r.brick, filename="Bangor.grd", format='raster', overwrite=TRUE)

# ERDAS Imagine format
writeRaster(r.brick, filename="Bangor.img", format='HFA', overwrite=TRUE)

If you have the rgdal package installed, you probably have access to many more raster output formats. To see the list, type gdalDrivers() at a command line.

Additional resources