Spatial data manipulation in R

Manny Gimond, Colby RUG
November, 2017

Data for this tutorial can be downloaded as a zipped file by clicking here.

Packages for basic spatial data manipulation

sp

Original spatial feature implementation in R.

tmap

A mapping package.

proj4

A popular coordinate system transformation/definition package.

rgdal

A package that reads a wide range of vector and raster files.

sf

The package sf (simple features) is based off the OGC/ISO for two-dimensional geometry data models.

raster

A comprehensive package for raster data manipulation.

Specialized Packages for spatial statistics

spatstat

This package encompasses a whole suite of statistical techniques for tackling point pattern analysis including nearest neighbor analysis and Poisson point process models.

spdep

This package is used to explore the spatial distribution of attribute information using analytical techniques such as spatial autocorrelation and fitting spatial autoregressive models.

DCluster

A set of functions for the detection of spatial clusters of disease using count data.

gstats

A variogram modelling environment used in interpolating discrete events which includes spatio-temporal kriging.

Spatial data classes

sp/spdep

SpatialPoints*: point vector
SpatialPolygons*: polygon vector
SpatialLines*: line vector
SpatialGrid*: raster

spatstat

ppp: point vector
im: raster

raster

Rasterlayer: single layer raster
RasterBrick: multi-layer raster

sf

  • Has 17 feature types! e.g. POINT, POLYGON, MULTIPOINT, GEOMETRYCOLLECTION
  • Seeks to replace sp standard

Loading a spatial file

library(sf)
shp <- st_read("Maine.shp")

sf can read a wide range of vector data file formats including spatially enabled databases such as postgresSQL (via the postGIS front-end) and popular web formats such as GeoJSON. It uses rgdal as the backend.

The sf data structure

print(shp, n=6)
Simple feature collection with 16 features and 1 field
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 336337.7 ymin: 4772272 xmax: 660529.1 ymax: 5255569
epsg (SRID):    26919
proj4string:    +proj=utm +zone=19 +datum=NAD83 +units=m +no_defs
First 6 features:
         NAME                       geometry
1   Aroostook MULTIPOLYGON (((513821.0732...
2    Somerset MULTIPOLYGON (((379071.8125...
3 Piscataquis MULTIPOLYGON (((445039.4925...
4   Penobscot MULTIPOLYGON (((472271.3073...
5  Washington MULTIPOLYGON (((645446.5278...
6    Franklin MULTIPOLYGON (((355457.8619...

Plotting the data

library(tmap)
tm_shape(shp) + 
  tm_fill(col="NAME") + 
  tm_legend(outside=TRUE)

plot of chunk unnamed-chunk-3

Joining a table to a spatial object

Here, we'll use dplyr's left_join to join a plain data file to a spatial object.

dat  <- read.csv("Income.csv")
head(dat,1)
     County Income
1 Aroostook  21024
library(dplyr)
shp2 <- left_join(shp, dat, 
                  by=c("NAME" = "County"))
tm_shape(shp2) + 
   tm_fill(col="Income",
           style="pretty",
           palette = "Greens") +
   tm_legend(outside=TRUE)

plot of chunk unnamed-chunk-5

Loading XY data

df <- data.frame(lon = c(-68.783, -69.6458, -69.7653),
                 lat = c(44.8109, 44.5521, 44.3235),
                 Name= c("Bangor", "Waterville", "Augusta"))

pts <- st_as_sf(df, coords = c("lon", "lat"),crs = 4326)
pts                          
Simple feature collection with 3 features and 1 field
geometry type:  POINT
dimension:      XY
bbox:           xmin: -69.7653 ymin: 44.3235 xmax: -68.783 ymax: 44.8109
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
        Name                 geometry
1     Bangor  POINT (-68.783 44.8109)
2 Waterville POINT (-69.6458 44.5521)
3    Augusta POINT (-69.7653 44.3235)

Geocoding addresses

df2 <- data.frame(
  Address = c("4000 mayflower Hill, Waterville, Maine", 
              "2 Andrews Rd, Lewiston, ME 04240",
              "255 Maine St, Brunswick, ME 04011"),
  College = c("Colby", "Bates", "Bowdoin"),  stringsAsFactors = FALSE)
library(ggmap)
colleges <- mutate_geocode(df2, location=Address, output="latlona", source="dsk" )
colleges
                                 Address College       lon      lat
1 4000 mayflower Hill, Waterville, Maine   Colby -69.65208 44.56719
2       2 Andrews Rd, Lewiston, ME 04240   Bates -70.20540 44.10721
3      255 Maine St, Brunswick, ME 04011 Bowdoin -69.96525 43.90702
                                 address
1 4000 mayflower Hill, Waterville, Maine
2       2 Andrews Rd, Lewiston, ME 04240
3      255 Maine St, Brunswick, ME 04011

The output is a data.frame that can be coerced into an sf object.

pts2 <- st_as_sf(colleges, coords = c("lon", "lat"), crs = 4326)

Loading and mapping a raster file

library(raster)
elev <- raster("Elev.tif")

tm_shape(elev) + 
  tm_raster(style = "cont",
            palette = "Greys",
            legend.show = FALSE) 

plot of chunk unnamed-chunk-10

Combining layers in a map

tm_shape(elev) + 
  tm_raster(style = "cont",
            palette = "Greys",
            legend.show = FALSE) +
tm_shape(shp2) + 
  tm_fill(col="Income",
         style="quantile", n=6,
         palette = "Greens",
         alpha = 0.5) +
  tm_legend(outside=TRUE) +
tm_shape(pts2) + 
  tm_dots(col="yellow", 
          border.col="black",size=.3) +
  tm_text("College", auto.placement = TRUE, 
          shadow=TRUE)

plot of chunk unnamed-chunk-11

Checking and defining a layer's coordinate system

Retrieve coordinate information

data(World) # retrieve built-in  world map
world_sf <- st_as_sf(World) # Convert to sf object
st_crs(world_sf)
Coordinate Reference System:
  No EPSG code
  proj4string: "+proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"

Defining the CS

If a coordinate system is not defined, or if it is not properly defined you can set the coordinate system as follows:

world_ek <- st_set_crs(world_sf, 
            "+proj=eck4 +lon_0=0 +x_0=0 +y_0=0 
             +ellps=WGS84")
tm_shape(world_ek) + tm_borders()

Note that setting a coordinate system is not the same as transforming a coordinate system.

plot of chunk unnamed-chunk-14

Transforming a layer's coordinate system

world_aeqd <- st_transform(world_sf, 
              "+proj=aeqd +lon_0=-69.5 +lat_0=44.5")
tm_shape(world_aeqd) + tm_borders()

plot of chunk unnamed-chunk-16

Subset geometric features by attribute

You can manipulate features by their attributes via dplyr!

library(dplyr)
shp.sub <- shp %>% 
  filter(NAME == "Kennebec")

tm_shape(shp.sub) + 
  tm_fill(col = "bisque") +
  tm_borders() + 
  tm_layout(frame=FALSE)

Or, for piping diehards,

shp %>% filter(NAME == "Kennebec") %>% 
 tm_shape()  +  
  tm_fill(col = "bisque") +
  tm_borders() + 
  tm_layout(frame=FALSE)

plot of chunk unnamed-chunk-18