Reading and writing spatial data in R

Introduction

There are several different R spatial formats to choose from. Your choice of format will largely be dictated by the package(s) and or function(s) used in your workflow. A breakdown of formats and intended use are listed below.

Data format Used with… Used in package… Used for… Comment
sf vector sf, others visualizing, manipulating, querying This is likely to become the new spatial standard in R. Will also read from spatially enabled databases such as postgresSQL.
raster raster raster, others visualizing, manipulating, spatial statistics This is the most versatile raster format
SpatialPoints* SpatialPolygons* SpatialLines* SpatialGrid* vector and raster sp, spdep Visualizing, spatial statistics May be superseded by sf in the future
ppp owin vector spatstat Point pattern analysis/statistics NA
im raster spatstat Point pattern analysis/statistics NA
1 The spatial* format includes SpatialPointsDataFrame, SpatialPolygonsDataFrame, SpatialLinesDataFrame, etc…

There is an attempt at standardizing the spatial format in the R ecosystem by adopting a well established set of spatial standards known as simple features. This effort results in a recently developed package called sf. It is therefore recommended that you work in an sf framework when possible. As of this writing, most of the basic data manipulation and visualization operations can be successfully conducted using sf spatial objects.

Some packages such as spdep and spatstat require specialized data object types. This tutorial will highlight some useful conversion functions for this purpose.

Creating spatial objects

The following sections demonstrate different spatial data object creation strategies.

Reading a shapefile

Shapefiles consist of many files sharing the same core filename and different suffixes (i.e. file extensions). For example, the sample shapefile used in this exercise consists of the following files:

[1] "Income_schooling.dbf" "Income_schooling.prj" "Income_schooling.sbn" "Income_schooling.sbx"
[5] "Income_schooling.shp" "Income_schooling.shx"

Note that the number of files associated with a shapefile can vary. sf only needs to be given the *.shp name. It will then know which other files to read into R such as projection information and attribute table.

Let’s view the first few records in the spatial data object.

Simple feature collection with 4 features and 5 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 379071.8 ymin: 4936182 xmax: 596500.1 ymax: 5255569
epsg (SRID):    26919
proj4string:    +proj=utm +zone=19 +datum=NAD83 +units=m +no_defs
         NAME Income   NoSchool NoSchoolSE IncomeSE                       geometry
1   Aroostook  21024 0.01338720 0.00140696  250.909 MULTIPOLYGON (((513821.1 51...
2    Somerset  21025 0.00521153 0.00115002  390.909 MULTIPOLYGON (((379071.8 50...
3 Piscataquis  21292 0.00633830 0.00212896  724.242 MULTIPOLYGON (((445039.5 51...
4   Penobscot  23307 0.00684534 0.00102545  242.424 MULTIPOLYGON (((472271.3 49...

Note that the sf object stores not only the geometry but the coordinate system information and attribute data as well. These will be explored later in this exercise.

Reading a GeoPackage

A geopackage can store more than one layer. To list the layers available in the geopackage, type:

Driver: GPKG 
Available layers:
  layer_name     geometry_type features fields
1 Interstate Multi Line String       35      1
2       Rail Multi Line String      730      3

In this example, we have two separate layers: Interstate and Rail. We can extract each layer separately via the layer= parameter.

Reading a raster

The raster package will read many different raster file formats such as geoTiff, Imagine and HDF5 just to name a few. To see a list of supported raster file formats simply run rgdal::gdalDrivers() at a command prompt. The rgdal package is normally installed with your installation of raster.

In the following example, an Imagine raster file is read into R.

What sets a raster object apart from other R data file objects is its storage. By default, data files are loaded into memory but raster objects are not. This can be convenient when working with raster files too large for memory. But this comes at a performance cost. If your RAM is large enough to handle your raster file, it’s best to load the entire dataset into memory.

To check if the elev.r object is loaded into memory, run:

[1] FALSE

To force the raster into memory use readAll():

Let’s check that the raster is indeed loaded into memory:

[1] TRUE

Now let’s look at the raster’s properties:

class      : RasterLayer 
dimensions : 994, 652, 648088  (nrow, ncol, ncell)
resolution : 500, 500  (x, y)
extent     : 336630.3, 662630.3, 4759303, 5256303  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=19 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
source     : memory
names      : elev 
values     : 0, 1546  (min, max)

The raster object returns its grid dimensions (number of rows and columns), pixel size/resolution (in the layer’s coordinate system units), geographic extent, native coordinate system (UTM NAD83 Zone 19 with units of meters) and min/max raster values.

Creating a spatial object from a data frame

Geographic point data locations recorded in a spreadsheet can be converted to a spatial point object. Note that it’s important that you specify the coordinate system used to record the coordinate pairs since such information is not stored in a data frame. In the following example, the coordinate values are recorded in a WGS 1984 geographic coordinate system (crs = 4326).

Simple feature collection with 3 features and 1 field
geometry type:  POINT
dimension:      XY
bbox:           xmin: -69.7653 ymin: 44.3235 xmax: -68.783 ymax: 44.8109
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
        Name                 geometry
1     Bangor  POINT (-68.783 44.8109)
2 Waterville POINT (-69.6458 44.5521)
3    Augusta POINT (-69.7653 44.3235)

Geocoding street addresses

The ggmap package offers a geocoding function called mutate_geocode which will take a table with physical addresses and create a new table with latitude and longitude values for those addresses. However, as of Spring 2019, ggmap will only access Google’s API which requires that a key be created on the Google Cloud (the latter will also require that a paid account be created with Google Cloud). The Data Science Toolkit, a previously free API alternative, has (as of May 2019) terminated its mapping services.

The Google API option will not be covered here, instead, the reader is encouraged to read the detailed instructions on ggmap’s Github page.

For a free (but manual) alternative, you can use the US Census Bureau’s geocoding service for creating lat/lon values from US street addresses. This needs to be completed via their web interface and the resulting data table (a CSV file) would then need to be loaded into R as a data frame.

Converting from an sf object

Packages such as spdep and spatsat currently do not support sf objects. The following sections demonstrate methods to convert from sf to other formats.

Converting an sf object to a Spatial* object (spdep/sp)

The following code will convert point, polyline or polygon features to a spatial* object. In this example, an sf polygon feature is converted to a SpatialPolygonsDataFrame object.

[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"

Note that if you wish to create a Spatial* object directly from a shapefile (and bypass the sf object creation), you could run the maptools function readShapeSpatial("Income_schooling.shp"). However, this approach strips the coordinate system information from the spatial object.

Converting an sf polygon object to an owin object

The spatstat package is normally used to analyze point patterns however, in most cases, the study extent needs to be explicitly defined by a polygon object. The polygon should be of class owin. Conversion from sf to owin requires the use of the maptools package.

Note that the attribute table gets stripped from the polygon data. This is usually fine given that the only reason for converting a polygon to an owin format is for delineating the study boundary.

[1] "owin"

Converting an sf point object to a ppp object

As of this writing, it seems that you need to first convert the sf object to a SpatialPoints* before creating a ppp object as shown in the following code chunk. Note that the maptools package is required for this step.

[1] "ppp"

If the point layer has an attribute table, its attributes will be converted to ppp marks.

Converting a raster object to an im object (spatstat)

The maptools package will readily convert a raster object to an im object using the as.im.RasterLayer() function.

[1] "im"

Converting to an sf object

All aforementioned spatial formats, except owin, can be coerced to an sf object via the st_as_sf function. for example:

Dissecting the sf file object

Simple feature collection with 3 features and 5 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 379071.8 ymin: 4936182 xmax: 596500.1 ymax: 5255569
epsg (SRID):    26919
proj4string:    +proj=utm +zone=19 +datum=NAD83 +units=m +no_defs
         NAME Income   NoSchool NoSchoolSE IncomeSE                       geometry
1   Aroostook  21024 0.01338720 0.00140696  250.909 MULTIPOLYGON (((513821.1 51...
2    Somerset  21025 0.00521153 0.00115002  390.909 MULTIPOLYGON (((379071.8 50...
3 Piscataquis  21292 0.00633830 0.00212896  724.242 MULTIPOLYGON (((445039.5 51...

The first line of output gives us the geometry type, MULTIPOLYGON, a multi-polygon data type. This is also referred to as a multipart polygon. A single-part sf polygon object will adopt the POLYGON geometry.

The next few lines of output give us the layer’s bounding extent in the layer’s native coordinate system units. You can extract the extent via the extent() function as in extent(s.sf).

The next lines indicate the coordinate system used to define the polygon feature locations. It’s offered in two formats: epsg code (when available) and PROJ4 string format. You can extract the coordinate system information from an sf object via:

Coordinate Reference System:
  EPSG: 26919 
  proj4string: "+proj=utm +zone=19 +datum=NAD83 +units=m +no_defs"

The output of this call is an object of class crs. Extracting the reference system from a spatial object can prove useful in certain workflows.

What remains of the sf summary output is the first few records of the attribute table. You can extract the object’s table to a dedicated data frame via:

[1] "data.frame"
         NAME Income   NoSchool  NoSchoolSE IncomeSE                       geometry
1   Aroostook  21024 0.01338720 0.001406960  250.909 MULTIPOLYGON (((513821.1 51...
2    Somerset  21025 0.00521153 0.001150020  390.909 MULTIPOLYGON (((379071.8 50...
3 Piscataquis  21292 0.00633830 0.002128960  724.242 MULTIPOLYGON (((445039.5 51...
4   Penobscot  23307 0.00684534 0.001025450  242.424 MULTIPOLYGON (((472271.3 49...
5  Washington  20015 0.00478188 0.000966036  327.273 MULTIPOLYGON (((645446.5 49...

The above chunk will also create a geometry column. This column is somewhat unique in that it stores its contents as a list of geometry coordinate pairs (polygon vertex coordinate values in this example).

'data.frame':   16 obs. of  6 variables:
 $ NAME      : Factor w/ 16 levels "Androscoggin",..: 2 13 11 10 15 4 9 14 6 1 ...
 $ Income    : int  21024 21025 21292 23307 20015 21744 21885 23020 25652 24268 ...
 $ NoSchool  : num  0.01339 0.00521 0.00634 0.00685 0.00478 ...
 $ NoSchoolSE: num  0.001407 0.00115 0.002129 0.001025 0.000966 ...
 $ IncomeSE  : num  251 391 724 242 327 ...
 $ geometry  :sfc_MULTIPOLYGON of length 16; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:32, 1:2] 513821 513806 445039 422284 424687 ...
  ..- attr(*, "class")= chr  "XY" "MULTIPOLYGON" "sfg"

You can also opt to remove this column prior to creating the dataframe as follows:

[1] "data.frame"
         NAME Income   NoSchool  NoSchoolSE IncomeSE
1   Aroostook  21024 0.01338720 0.001406960  250.909
2    Somerset  21025 0.00521153 0.001150020  390.909
3 Piscataquis  21292 0.00633830 0.002128960  724.242
4   Penobscot  23307 0.00684534 0.001025450  242.424
5  Washington  20015 0.00478188 0.000966036  327.273

Exporting to different data file formats

You can export an sf object to many different spatial file formats such as a shapefile or a geopackage.

You can see a list of writable vector formats via a call to subset(rgdal::ogrDrivers(), write == TRUE). Only as subset of the output is shown in the following example. Note that supported file formats will differ from platform to platform.

             name                       long_name write  copy isVector
17 ESRI Shapefile                  ESRI Shapefile  TRUE FALSE     TRUE
18     Geoconcept                      Geoconcept  TRUE FALSE     TRUE
19        GeoJSON                         GeoJSON  TRUE FALSE     TRUE
21         GeoRSS                          GeoRSS  TRUE FALSE     TRUE
22            GFT            Google Fusion Tables  TRUE FALSE     TRUE
23            GML Geography Markup Language (GML)  TRUE FALSE     TRUE
24           GPKG                      GeoPackage  TRUE  TRUE     TRUE

The value in the name column is the driver name used in the st_write() function.

To export a raster to a data file, use writeRaster() from the raster package.

You can see a list of writable raster formats via a call to subset(rgdal::gdalDrivers(), create == TRUE).

        name                            long_name create  copy isRaster
43      GPKG                           GeoPackage   TRUE  TRUE     TRUE
46     GS7BG Golden Software 7 Binary Grid (.grd)   TRUE  TRUE     TRUE
48      GSBG   Golden Software Binary Grid (.grd)   TRUE  TRUE     TRUE
50     GTiff                              GeoTIFF   TRUE  TRUE     TRUE
51       GTX             NOAA Vertical Datum .GTX   TRUE FALSE     TRUE
54 HDF4Image                         HDF4 Dataset   TRUE FALSE     TRUE
58       HFA          Erdas Imagine Images (.img)   TRUE  TRUE     TRUE

The value in the name column is the format parameter name used in the writeRaster() function.