A Reading and writing spatial data in R

R sf terra tidygeocoder spatstat
4.3.1 1.0.14 1.7.65 1.0.5 3.0.7

Sample files for this exercise

First, you will need to download some sample files from the github repository. Make sure to set your R session folder to the directory where you will want to save the sample files before running the following code chunks.

download.file("https://github.com/mgimond/Spatial/raw/main/Data/Income_schooling.zip", 
              destfile = "Income_schooling.zip" , mode='wb')
unzip("Income_schooling.zip", exdir = ".")
file.remove("Income_schooling.zip")
download.file("https://github.com/mgimond/Spatial/raw/main/Data/rail_inters.gpkg", 
              destfile = "./rail_inters.gpkg", mode='wb')
download.file("https://github.com/mgimond/Spatial/raw/main/Data/elev.img",  
              destfile = "./elev.img", mode='wb')               

Introduction

There are several different R spatial formats to choose from. Your choice of format will largely be dictated by the package(s) and or function(s) used in your workflow. A breakdown of formats and intended use are listed below.

Data format Used with… Used in package… Used for… Comment
sf vector sf, others visualizing, manipulating, querying This is the new spatial standard in R. Will also read from spatially enabled databases such as postgresSQL.
raster raster raster, others visualizing, manipulating, spatial statistics This has been the most popular raster format fo rmany years. But, it is gradually being supplanted by terra
SpatRaster terra terra, others visualizing, manipulating, spatial statistics This is gradually replacing raster
SpatialPoints* SpatialPolygons* SpatialLines* SpatialGrid* vector and raster sp, spdep Visualizing, spatial statistics These are legacy formats. spdep now accepts sf objects
ppp owin vector spatstat Point pattern analysis/statistics NA
im raster spatstat Point pattern analysis/statistics NA
1 The spatial* format includes SpatialPointsDataFrame, SpatialPolygonsDataFrame, SpatialLinesDataFrame, etc…

There is an attempt at standardizing the spatial format in the R ecosystem by adopting a well established set of spatial standards known as simple features. This effort results in a recently developed package called sf (Pebesma 2018). It is therefore recommended that you work in an sf framework when possible. As of this writing, most of the basic data manipulation and visualization operations can be successfully conducted using sf spatial objects.

Some packages such as spdep and spatstat require specialized data object types. This tutorial will highlight some useful conversion functions for this purpose.

Creating spatial objects

The following sections demonstrate different spatial data object creation strategies.

Reading a shapefile

Shapefiles consist of many files sharing the same core filename and different suffixes (i.e. file extensions). For example, the sample shapefile used in this exercise consists of the following files:

[1] "Income_schooling.dbf" "Income_schooling.prj" "Income_schooling.sbn" "Income_schooling.sbx"
[5] "Income_schooling.shp" "Income_schooling.shx"

Note that the number of files associated with a shapefile can vary. sf only needs to be given the *.shp name. It will then know which other files to read into R such as projection information and attribute table.

library(sf)
s.sf <- st_read("Income_schooling.shp")

Let’s view the first few records in the spatial data object.

head(s.sf, n=4)  # List spatial object and the first 4 attribute records
Simple feature collection with 4 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 379071.8 ymin: 4936182 xmax: 596500.1 ymax: 5255569
Projected CRS: NAD83 / UTM zone 19N
         NAME Income   NoSchool NoSchoolSE IncomeSE                       geometry
1   Aroostook  21024 0.01338720 0.00140696  250.909 MULTIPOLYGON (((513821.1 51...
2    Somerset  21025 0.00521153 0.00115002  390.909 MULTIPOLYGON (((379071.8 50...
3 Piscataquis  21292 0.00633830 0.00212896  724.242 MULTIPOLYGON (((445039.5 51...
4   Penobscot  23307 0.00684534 0.00102545  242.424 MULTIPOLYGON (((472271.3 49...

Note that the sf object stores not only the geometry but the coordinate system information and attribute data as well. These will be explored later in this exercise.

Reading a GeoPackage

A geopackage can store more than one layer. To list the layers available in the geopackage, type:

st_layers("rail_inters.gpkg")
Driver: GPKG 
Available layers:
  layer_name     geometry_type features fields             crs_name
1 Interstate Multi Line String       35      1                NAD83
2       Rail Multi Line String      730      3 NAD83 / UTM zone 19N

In this example, we have two separate layers: Interstate and Rail. We can extract each layer separately via the layer= parameter.

inter.sf <- st_read("rail_inters.gpkg", layer="Interstate")
rail.sf  <- st_read("rail_inters.gpkg", layer="Rail")

Reading a raster

In earlier versions of this tutorial, the raster package was used to read raster files. This is being supplanted by terra which will be the package used in this and in subsequent exercises.

terra will read many different raster file formats such as geoTiff, Imagine and HDF5 just to name a few. To see a list of supported raster file formats on your computer simply run:

terra::gdal(drivers = TRUE) |> subset(type == "raster")

In the following example, an Imagine raster file is read into R using the rast function.

library(terra)
elev.r <- rast("elev.img")

The object class is of type SpatRaster.

class(elev.r)
[1] "SpatRaster"
attr(,"package")
[1] "terra"

What sets a SpatRaster object apart from other R data file objects is its storage. By default, data files are loaded into memory, but SpatRaster objects are not. This can be convenient when working with raster files too large for memory. But this comes at a performance cost. If your RAM is large enough to handle your raster file, it’s best to load the entire dataset into memory.

To check if the elev.r object is loaded into memory, run:

inMemory(elev.r)
[1] FALSE

An output of FALSE indicates that it is not. To force the raster into memory use set.values:

set.values(elev.r)

Let’s check that the raster is indeed loaded into memory:

inMemory(elev.r)
[1] TRUE

Now let’s look at the raster’s properties:

elev.r
class       : SpatRaster 
dimensions  : 994, 652, 1  (nrow, ncol, nlyr)
resolution  : 500, 500  (x, y)
extent      : 336630.3, 662630.3, 4759303, 5256303  (xmin, xmax, ymin, ymax)
coord. ref. : NAD_1983_UTM_Zone_19N (EPSG:26919) 
source(s)   : memory
varname     : elev 
name        : Layer_1 
min value   :       0 
max value   :    1546 

The raster object returns its grid dimensions (number of rows and columns), pixel size/resolution (in the layer’s coordinate system units), geographic extent, native coordinate system (UTM NAD83 Zone 19 with units of meters) and min/max raster values.

Creating a spatial object from a data frame

Geographic point data locations recorded in a spreadsheet can be converted to a spatial point object. Note that it’s important that you specify the coordinate system used to record the coordinate pairs since such information is not stored in a data frame. In the following example, the coordinate values are recorded in a WGS 1984 geographic coordinate system (crs = 4326).

# Create a simple dataframe with lat/long values
df <- data.frame(lon = c(-68.783, -69.6458, -69.7653),
                 lat = c(44.8109, 44.5521, 44.3235),
                 Name= c("Bangor", "Waterville", "Augusta"))

# Convert the dataframe to a spatial object. Note that the
# crs= 4326 parameter assigns a WGS84 coordinate system to the 
# spatial object
p.sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326) 
p.sf  
Simple feature collection with 3 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -69.7653 ymin: 44.3235 xmax: -68.783 ymax: 44.8109
Geodetic CRS:  WGS 84
        Name                 geometry
1     Bangor  POINT (-68.783 44.8109)
2 Waterville POINT (-69.6458 44.5521)
3    Augusta POINT (-69.7653 44.3235)

Geocoding street addresses

The tidygeocoder package will convert street addresses to latitude/longitude coordinate pairs using a wide range of geocoding services such as the US census and Google. Some of these geocoding services will require an API key, others will not. Click here to see the list of geocoding services supported by tidygeocoder and their geocoding limitations. In the example that follows, the osm geocoding service is used by default.

library(tidygeocoder)

options(pillar.sigfig = 7) # Increase significant digits in displayed output

dat <- data.frame(
  name = c("Colby College", "Bates College", "Bowdoin College"),
  address = c("4000 Mayflower drive, Waterville, ME , 04901",
              "275 College st, Lewiston, ME 04240",
              "255 Maine St, Brunswick, ME 04011"))

geocode(.tbl = dat, address = address, method = "osm")           
# A tibble: 3 × 4
  name            address                                           lat      long
  <chr>           <chr>                                           <dbl>     <dbl>
1 Colby College   4000 Mayflower drive, Waterville, ME , 04901 44.56615 -69.66232
2 Bates College   275 College st, Lewiston, ME 04240           44.10638 -70.20636
3 Bowdoin College 255 Maine St, Brunswick, ME 04011            43.90870 -69.96142

Another free (but manual) alternative, is to use the US Census Bureau’s web geocoding service for creating lat/lon values from a file of US street addresses. This needs to be completed via their web interface and the resulting data table (a CSV file) would then need to be loaded into R as a data frame.

Converting from an sf object

Packages such as spdep (older versions only) and spatsat do not support sf objects. The following sections demonstrate methods to convert from sf to other formats.

Converting an sf object to a Spatial* object (spdep/sp)

The following code will convert point, polyline or polygon features to a spatial* object. While the current version of spdep will now accept sf objects, converting to spatial* objects will be necessary with legacy spdep packages. In this example, an sf polygon feature is converted to a SpatialPolygonsDataFrame object.

s.sp <- as_Spatial(s.sf)
class(s.sp)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"

Converting an sf polygon object to an owin object

The spatstat package is used to analyze point patterns however, in most cases, the study extent needs to be explicitly defined by a polygon object. The polygon should be of class owin.

library(spatstat)
s.owin <- as.owin(s.sf)
class(s.owin)
[1] "owin"

Note the loading of the package spatstat. This is required to access the as.owin.sf method for sf. Note too that the attribute table gets stripped from the polygon data. This is usually fine given that the only reason for converting a polygon to an owin format is for delineating the study boundary.

Converting an sf point object to a ppp object

The spatstat package is currently designed to work with projected (planar) coordinate system. If you attempt to convert a point object that is in a geographic coordinate system, you will get the following error message:

p.ppp <- as.ppp(p.sf)
Error: Only projected coordinates may be converted to spatstat class objects

The error message reminds us that a geographic coordinate system (i.e. one that uses angular measurements such as latitude/longitude) cannot be used with this package. If you encounter this error, you will need to project the point object to a projected coordinate system.

In this example, we’ll project the p.sf object to a UTM coordinate system (epsg=32619). Coordinate systems in R are treated in a later appendix.

p.sf.utm <- st_transform(p.sf, 32619) # project from geographic to UTM
p.ppp <- as.ppp(p.sf.utm)             # Create ppp object
class(p.ppp)
[1] "ppp"

Note that if the point layer has an attribute table, its attributes will be converted to ppp marks. These attribute values can be accessed via marks(p.ppp).

Converting a SpatRaster object to an im object

To create a spatstat im raster object from a SpatRaster object, you will need to first create a three column dataframe from the SpatRaster objects with the first two columns defining the X and Y coordinate values of each cell, and the third column defining the cell values

df <- as.data.frame(elev.r,xy=TRUE)
elev.im <- as.im(df)
class(elev.im)
[1] "im"

Converting to an sf object

All aforementioned spatial formats, except owin, can be coerced to an sf object via the st_as_sf function. for example:

st_as_sf(p.ppp)  # For converting a ppp object to an sf object
st_as_sf(s.sp)   # For converting a Spatial* object to an sf object

Dissecting the sf file object

head(s.sf,3)
Simple feature collection with 3 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 379071.8 ymin: 4936182 xmax: 596500.1 ymax: 5255569
Projected CRS: NAD83 / UTM zone 19N
         NAME Income   NoSchool NoSchoolSE IncomeSE                       geometry
1   Aroostook  21024 0.01338720 0.00140696  250.909 MULTIPOLYGON (((513821.1 51...
2    Somerset  21025 0.00521153 0.00115002  390.909 MULTIPOLYGON (((379071.8 50...
3 Piscataquis  21292 0.00633830 0.00212896  724.242 MULTIPOLYGON (((445039.5 51...

The first line of output gives us the geometry type, MULTIPOLYGON, a multi-polygon data type. This is also referred to as a multipart polygon. A single-part sf polygon object will adopt the POLYGON geometry.

The next few lines of output give us the layer’s bounding extent in the layer’s native coordinate system units. You can extract the extent via the st_bbox() function as in st_bbox(s.sf).

The following code chunk can be used to extract addition coordinate information from the data.

st_crs(s.sf)

Depending on the version of the PROJ library used by sf, you can get two different outputs. If your version of sf is built with a version of PROJ older than 6.0, the output will consist of an epsg code (when available) and a proj4 string as follows:

Coordinate Reference System:
  EPSG: 26919 
  proj4string: "+proj=utm +zone=19 +datum=NAD83 +units=m +no_defs"

If your version of sf is built with a version of PROJ 6.0 or greater, the output will consist of a user defined CS definition (e.g. an epsg code), if available, and a Well Known Text (WKT) formatted coordinate definition that consists of a series of [ ] tags as follows:

Coordinate Reference System:
  User input: NAD83 / UTM zone 19N 
  wkt:
PROJCRS["NAD83 / UTM zone 19N",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["UTM zone 19N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-69,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    ID["EPSG",26919]]

The WKT format will usually start with a PROJCRS[...] tag for a projected coordinate system, or a GEOGCRS[...] tag for a geographic coordinate system. More information on coordinate systems in R can be found in the coordinate systems appendix.

What remains of the sf summary output is the first few records of the attribute table. You can extract the object’s table to a dedicated data frame via:

s.df <- data.frame(s.sf)
class(s.df)
[1] "data.frame"
head(s.df, 5)
         NAME Income   NoSchool  NoSchoolSE IncomeSE                       geometry
1   Aroostook  21024 0.01338720 0.001406960  250.909 MULTIPOLYGON (((513821.1 51...
2    Somerset  21025 0.00521153 0.001150020  390.909 MULTIPOLYGON (((379071.8 50...
3 Piscataquis  21292 0.00633830 0.002128960  724.242 MULTIPOLYGON (((445039.5 51...
4   Penobscot  23307 0.00684534 0.001025450  242.424 MULTIPOLYGON (((472271.3 49...
5  Washington  20015 0.00478188 0.000966036  327.273 MULTIPOLYGON (((645446.5 49...

The above chunk will also create a geometry column. This column is somewhat unique in that it stores its contents as a list of geometry coordinate pairs (polygon vertex coordinate values in this example).

str(s.df)
'data.frame':   16 obs. of  6 variables:
 $ NAME      : chr  "Aroostook" "Somerset" "Piscataquis" "Penobscot" ...
 $ Income    : int  21024 21025 21292 23307 20015 21744 21885 23020 25652 24268 ...
 $ NoSchool  : num  0.01339 0.00521 0.00634 0.00685 0.00478 ...
 $ NoSchoolSE: num  0.001407 0.00115 0.002129 0.001025 0.000966 ...
 $ IncomeSE  : num  251 391 724 242 327 ...
 $ geometry  :sfc_MULTIPOLYGON of length 16; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:32, 1:2] 513821 513806 445039 422284 424687 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"

You can also opt to remove this column prior to creating the dataframe as follows:

s.nogeom.df <- st_set_geometry(s.sf, NULL) 
class(s.nogeom.df)
[1] "data.frame"
head(s.nogeom.df, 5)
         NAME Income   NoSchool  NoSchoolSE IncomeSE
1   Aroostook  21024 0.01338720 0.001406960  250.909
2    Somerset  21025 0.00521153 0.001150020  390.909
3 Piscataquis  21292 0.00633830 0.002128960  724.242
4   Penobscot  23307 0.00684534 0.001025450  242.424
5  Washington  20015 0.00478188 0.000966036  327.273

Exporting to different data file formats

You can export an sf object to many different spatial file formats such as a shapefile or a geopackage.

st_write(s.sf, "shapefile_out.shp", driver = "ESRI Shapefile")  # create to a shapefile 
st_write(s.sf, "s.gpkg", driver = "GPKG")  # Create a geopackage file

If the file you are writing to already exists, the above will throw an error. To force an overwrite, simply add the delete_layer = TRUE argument to the st_write function.

You can see a list of writable vector formats via:

gdal(drivers = TRUE) |> subset(can %in% c("write", "read/write" ) & type == "vector")

The value in the name column is the driver name to pass to the driver = argument in the st_write() function.

To export a raster to a data file, use writeRaster() function.

writeRaster(elev.r, "elev_out.tif", gdal = "GTiff" ) # Create a geoTiff file
writeRaster(elev.r, "elev_out.img", gdal = "HFA" )  # Create an Imagine raster file

You can see a list of writable raster formats via:

gdal(drivers = TRUE) |> subset(can %in% c("write", "read/write" ) & type == "raster")

The value in the name column is the driver name to pass to the gdal = argument in the writeRaster() function.

References

Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.