First, you will need to download some sample files from the github repository. Make sure to set your R session folder to the directory where you will want to save the sample files before running the following code chunks.
download.file("https://github.com/mgimond/Spatial/raw/main/Data/Income_schooling.zip", destfile = "Income_schooling.zip" , mode='wb') unzip("Income_schooling.zip", exdir = ".") file.remove("Income_schooling.zip")
download.file("https://github.com/mgimond/Spatial/raw/main/Data/rail_inters.gpkg", destfile = "./rail_inters.gpkg", mode='wb')
download.file("https://github.com/mgimond/Spatial/raw/main/Data/elev.img", destfile = "./elev.img", mode='wb')
There are several different R spatial formats to choose from. Your choice of format will largely be dictated by the package(s) and or function(s) used in your workflow. A breakdown of formats and intended use are listed below.
|Data format||Used with…||Used in package…||Used for…||Comment|
||visualizing, manipulating, querying||This is likely to become the new spatial standard in R. Will also read from spatially enabled databases such as postgresSQL.|
||visualizing, manipulating, spatial statistics||This is the most versatile raster format|
||vector and raster||
||Visualizing, spatial statistics||
These are legacy formats.
||Point pattern analysis/statistics||NA|
||Point pattern analysis/statistics||NA|
There is an attempt at standardizing the spatial format in the R ecosystem by adopting a well established set of spatial standards known as simple features. This effort results in a recently developed package called
sf (Pebesma 2018). It is therefore recommended that you work in an
sf framework when possible. As of this writing, most of the basic data manipulation and visualization operations can be successfully conducted using
sf spatial objects.
Some packages such as
spatstat require specialized data object types. This tutorial will highlight some useful conversion functions for this purpose.
The following sections demonstrate different spatial data object creation strategies.
Shapefiles consist of many files sharing the same core filename and different suffixes (i.e. file extensions). For example, the sample shapefile used in this exercise consists of the following files:
 "Income_schooling.dbf" "Income_schooling.prj" "Income_schooling.sbn" "Income_schooling.sbx"  "Income_schooling.shp" "Income_schooling.shx"
Note that the number of files associated with a shapefile can vary.
sf only needs to be given the
*.shp name. It will then know which other files to read into R such as projection information and attribute table.
library(sf) <- st_read("Income_schooling.shp")s.sf
Let’s view the first few records in the spatial data object.
head(s.sf, n=4) # List spatial object and the first 4 attribute records
Simple feature collection with 4 features and 5 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 379071.8 ymin: 4936182 xmax: 596500.1 ymax: 5255569 Projected CRS: NAD83 / UTM zone 19N NAME Income NoSchool NoSchoolSE IncomeSE geometry 1 Aroostook 21024 0.01338720 0.00140696 250.909 MULTIPOLYGON (((513821.1 51... 2 Somerset 21025 0.00521153 0.00115002 390.909 MULTIPOLYGON (((379071.8 50... 3 Piscataquis 21292 0.00633830 0.00212896 724.242 MULTIPOLYGON (((445039.5 51... 4 Penobscot 23307 0.00684534 0.00102545 242.424 MULTIPOLYGON (((472271.3 49...
Note that the
sf object stores not only the geometry but the coordinate system information and attribute data as well. These will be explored later in this exercise.
A geopackage can store more than one layer. To list the layers available in the geopackage, type:
Driver: GPKG Available layers: layer_name geometry_type features fields crs_name 1 Interstate Multi Line String 35 1 NAD83 2 Rail Multi Line String 730 3 NAD83 / UTM zone 19N
In this example, we have two separate layers:
Rail. We can extract each layer separately via the
<- st_read("rail_inters.gpkg", layer="Interstate") inter.sf <- st_read("rail_inters.gpkg", layer="Rail")rail.sf
raster package will read many different raster file formats such as geoTiff, Imagine and HDF5 just to name a few. To see a list of supported raster file formats simply run
rgdal::gdalDrivers() at a command prompt. The
rgdal package is normally installed with your installation of
In the following example, an Imagine raster file is read into R.
library(raster) <- raster("elev.img")elev.r
What sets a
raster object apart from other R data file objects is its storage. By default, data files are loaded into memory but
raster objects are not. This can be convenient when working with raster files too large for memory. But this comes at a performance cost. If your RAM is large enough to handle your raster file, it’s best to load the entire dataset into memory.
To check if the
elev.r object is loaded into memory, run:
To force the raster into memory use
Let’s check that the raster is indeed loaded into memory:
Now let’s look at the raster’s properties:
class : RasterLayer dimensions : 994, 652, 648088 (nrow, ncol, ncell) resolution : 500, 500 (x, y) extent : 336630.3, 662630.3, 4759303, 5256303 (xmin, xmax, ymin, ymax) crs : +proj=utm +zone=19 +datum=NAD83 +units=m +no_defs source : memory names : elev values : 0, 1546 (min, max)
The raster object returns its grid dimensions (number of rows and columns), pixel size/resolution (in the layer’s coordinate system units), geographic extent, native coordinate system (UTM NAD83 Zone 19 with units of meters) and min/max raster values.
Geographic point data locations recorded in a spreadsheet can be converted to a spatial point object. Note that it’s important that you specify the coordinate system used to record the coordinate pairs since such information is not stored in a data frame. In the following example, the coordinate values are recorded in a WGS 1984 geographic coordinate system (
crs = 4326).
# Create a simple dataframe with lat/long values <- data.frame(lon = c(-68.783, -69.6458, -69.7653), df lat = c(44.8109, 44.5521, 44.3235), Name= c("Bangor", "Waterville", "Augusta")) # Convert the dataframe to a spatial object. Note that the # crs= 4326 parameter assigns a WGS84 coordinate system to the # spatial object <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326) p.sf p.sf
Simple feature collection with 3 features and 1 field Geometry type: POINT Dimension: XY Bounding box: xmin: -69.7653 ymin: 44.3235 xmax: -68.783 ymax: 44.8109 Geodetic CRS: WGS 84 Name geometry 1 Bangor POINT (-68.783 44.8109) 2 Waterville POINT (-69.6458 44.5521) 3 Augusta POINT (-69.7653 44.3235)
tidygeocoder package will convert street addresses to latitude/longitude coordinate pairs using a wide range of geocoding services such as the US census and Google. Some of these geocoding services will require an API key, others will not. Click here to see the list of geocoding services supported by
tidygeocoder and their geocoding limitations. In the example that follows, the
osm geocoding service is used by default.
library(tidygeocoder) options(pillar.sigfig = 7) # Increase significant digits in displayed output <- data.frame( dat name = c("Colby College", "Bates College", "Bowdoin College"), address = c("4000 Mayflower drive, Waterville, ME , 04901", "275 College st, Lewiston, ME 04240", "255 Maine St, Brunswick, ME 04011")) geocode(.tbl = dat, address = address, method = "osm")
# A tibble: 3 × 4 name address lat long <chr> <chr> <dbl> <dbl> 1 Colby College 4000 Mayflower drive, Waterville, ME , 04901 44.56146 -69.65849 2 Bates College 275 College st, Lewiston, ME 04240 44.10638 -70.20636 3 Bowdoin College 255 Maine St, Brunswick, ME 04011 43.90641 -69.96424
Another free (but manual) alternative, is to use the US Census Bureau’s web geocoding service for creating lat/lon values from a file of US street addresses. This needs to be completed via their web interface and the resulting data table (a CSV file) would then need to be loaded into R as a data frame.
Packages such as
spatsat currently do not support
sf objects. The following sections demonstrate methods to convert from
sf to other formats.
The following code will convert point, polyline or polygon features to a
spatial* object. While the current version of
spdep will now accept
sf objects, converting to
spatial* objects will be necessary with legacy
spdep packages. In this example, an
sf polygon feature is converted to a
<- as(s.sf, "Spatial") s.sp class(s.sp)
 "SpatialPolygonsDataFrame" attr(,"package")  "sp"
Note that if you wish to create a
Spatial* object directly from a shapefile (and bypass the
sf object creation), you could run the
readORG(".","Income_schooling.shp"). However, this former strips the coordinate system information from the spatial object.
spatstat package is normally used to analyze point patterns however, in most cases, the study extent needs to be explicitly defined by a polygon object. The polygon should be of class
owin. Conversion from
owin requires the use of the
Note that the attribute table gets stripped from the polygon data. This is usually fine given that the only reason for converting a polygon to an
owin format is for delineating the study boundary.
library(maptools) <- as(s.sp, "owin") s.owin class(s.owin)
As of this writing, it seems that you need to first convert the
sf object to a
SpatialPoints* before creating a
ppp object as shown in the following code chunk. Note that the
maptools package is required for this step.
<- as(p.sf, "Spatial") # Create Spatial* object p.sp <- as(p.sp, "ppp") # Create ppp objectp.ppp
Error in as.ppp.SpatialPointsDataFrame(from): Only projected coordinates may be converted to spatstat class objects
ppp object is associated with the
spatstat package which is designed to work off of a projected (cartesian) coordinate system. The error message reminds us that a geographic coordinate system (i.e. one that uses angular measurements such as latitude/longitude) cannot be used with this package. If you encounter this error, you will need to project the
ps.f layer to a projected coordinate system.
In this example, we’ll project the
p.sf object to a UTM coordinate system (
epsg=32619). Coordinate systems in R are treated in a separate appendix.
<- st_transform(p.sf, 32619) # project from geographic to UTM p.sf.utm <- as(p.sf.utm, "Spatial") # Create Spatial* object p.sp <- as(p.sp, "ppp") # Create ppp object p.ppp class(p.ppp)
Note that if the point layer has an attribute table, its attributes will be converted to
All aforementioned spatial formats, except
owin, can be coerced to an
sf object via the
st_as_sf function. for example:
st_as_sf(p.ppp) # For converting a ppp object to an sf object st_as_sf(s.sp) # For converting a Spatial* object to an sf object
Simple feature collection with 3 features and 5 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 379071.8 ymin: 4936182 xmax: 596500.1 ymax: 5255569 Projected CRS: NAD83 / UTM zone 19N NAME Income NoSchool NoSchoolSE IncomeSE geometry 1 Aroostook 21024 0.01338720 0.00140696 250.909 MULTIPOLYGON (((513821.1 51... 2 Somerset 21025 0.00521153 0.00115002 390.909 MULTIPOLYGON (((379071.8 50... 3 Piscataquis 21292 0.00633830 0.00212896 724.242 MULTIPOLYGON (((445039.5 51...
The first line of output gives us the geometry type,
MULTIPOLYGON, a multi-polygon data type. This is also referred to as a multipart polygon. A single-part
sf polygon object will adopt the
The next few lines of output give us the layer’s bounding extent in the layer’s native coordinate system units. You can extract the extent via the
extent() function as in
The following code chunk can be used to extract addition coordinate information from the data.
Depending on the version of the
PROJ library used by
sf, you can get two different outputs. If your version of
sf is built with a version of
PROJ older than
6.0, the output will consist of an epsg code (when available) and a proj4 string as follows:
: Coordinate Reference System: 26919 EPSG: "+proj=utm +zone=19 +datum=NAD83 +units=m +no_defs" proj4string
If your version of
sf is built with a version of
6.0 or greater, the output will consist of a user defined CS definition (e.g. an epsg code), if available, and a Well Known Text (WKT) formatted coordinate definition that consists of a series of
[ ] tags as follows:
Coordinate Reference System: User input: NAD83 / UTM zone 19N wkt: PROJCRS["NAD83 / UTM zone 19N", BASEGEOGCRS["NAD83", DATUM["North American Datum 1983", ELLIPSOID["GRS 1980",6378137,298.257222101, LENGTHUNIT["metre",1]]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433]], ID["EPSG",4269]], CONVERSION["UTM zone 19N", METHOD["Transverse Mercator", ID["EPSG",9807]], PARAMETER["Latitude of natural origin",0, ANGLEUNIT["Degree",0.0174532925199433], ID["EPSG",8801]], PARAMETER["Longitude of natural origin",-69, ANGLEUNIT["Degree",0.0174532925199433], ID["EPSG",8802]], PARAMETER["Scale factor at natural origin",0.9996, SCALEUNIT["unity",1], ID["EPSG",8805]], PARAMETER["False easting",500000, LENGTHUNIT["metre",1], ID["EPSG",8806]], PARAMETER["False northing",0, LENGTHUNIT["metre",1], ID["EPSG",8807]]], CS[Cartesian,2], AXIS["(E)",east, ORDER, LENGTHUNIT["metre",1]], AXIS["(N)",north, ORDER, LENGTHUNIT["metre",1]], ID["EPSG",26919]]
The WKT format will usually start with a
PROJCRS[...] tag for a projected coordinate system, or a
GEOGCRS[...] tag for a geographic coordinate system. More information on coordinate systems in R can be found in the coordinate systems appendix.
What remains of the
sf summary output is the first few records of the attribute table. You can extract the object’s table to a dedicated data frame via:
<- data.frame(s.sf) s.df class(s.df)
NAME Income NoSchool NoSchoolSE IncomeSE geometry 1 Aroostook 21024 0.01338720 0.001406960 250.909 MULTIPOLYGON (((513821.1 51... 2 Somerset 21025 0.00521153 0.001150020 390.909 MULTIPOLYGON (((379071.8 50... 3 Piscataquis 21292 0.00633830 0.002128960 724.242 MULTIPOLYGON (((445039.5 51... 4 Penobscot 23307 0.00684534 0.001025450 242.424 MULTIPOLYGON (((472271.3 49... 5 Washington 20015 0.00478188 0.000966036 327.273 MULTIPOLYGON (((645446.5 49...
The above chunk will also create a geometry column. This column is somewhat unique in that it stores its contents as a list of geometry coordinate pairs (polygon vertex coordinate values in this example).
'data.frame': 16 obs. of 6 variables: $ NAME : chr "Aroostook" "Somerset" "Piscataquis" "Penobscot" ... $ Income : int 21024 21025 21292 23307 20015 21744 21885 23020 25652 24268 ... $ NoSchool : num 0.01339 0.00521 0.00634 0.00685 0.00478 ... $ NoSchoolSE: num 0.001407 0.00115 0.002129 0.001025 0.000966 ... $ IncomeSE : num 251 391 724 242 327 ... $ geometry :sfc_MULTIPOLYGON of length 16; first list element: List of 1 ..$ :List of 1 .. ..$ : num [1:32, 1:2] 513821 513806 445039 422284 424687 ... ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
You can also opt to remove this column prior to creating the dataframe as follows:
<- st_set_geometry(s.sf, NULL) s.nogeom.df class(s.nogeom.df)
NAME Income NoSchool NoSchoolSE IncomeSE 1 Aroostook 21024 0.01338720 0.001406960 250.909 2 Somerset 21025 0.00521153 0.001150020 390.909 3 Piscataquis 21292 0.00633830 0.002128960 724.242 4 Penobscot 23307 0.00684534 0.001025450 242.424 5 Washington 20015 0.00478188 0.000966036 327.273
You can export an
sf object to many different spatial file formats such as a shapefile or a geopackage.
st_write(s.sf, "shapefile_out.shp", driver="ESRI Shapefile") # create to a shapefile st_write(s.sf, " s.gpkg", driver="GPKG") # Create a geopackage file
You can see a list of writable vector formats via a call to
subset(rgdal::ogrDrivers(), write == TRUE). Only as subset of the output is shown in the following example. Note that supported file formats will differ from platform to platform.
name long_name write copy isVector 17 ESRI Shapefile ESRI Shapefile TRUE FALSE TRUE 20 FlatGeobuf FlatGeobuf TRUE FALSE TRUE 21 Geoconcept Geoconcept TRUE FALSE TRUE 22 GeoJSON GeoJSON TRUE FALSE TRUE 23 GeoJSONSeq GeoJSON Sequence TRUE FALSE TRUE 24 GeoRSS GeoRSS TRUE FALSE TRUE 25 GML Geography Markup Language (GML) TRUE FALSE TRUE
The value in the
name column is the driver name used in the
To export a raster to a data file, use
writeRaster() from the
writeRaster(elev.r, "elev_out.tif", format="GTiff" ) # Create a geoTiff file writeRaster(elev.r, "elev_out.img", format="HFA" ) # Create an Imagine raster file
You can see a list of writable raster formats via a call to
subset(rgdal::gdalDrivers(), create == TRUE).
name long_name create copy isRaster 36 ERS ERMapper .ers Labelled TRUE FALSE TRUE 46 GPKG GeoPackage TRUE TRUE TRUE 49 GS7BG Golden Software 7 Binary Grid (.grd) TRUE TRUE TRUE 51 GSBG Golden Software Binary Grid (.grd) TRUE TRUE TRUE 54 GTiff GeoTIFF TRUE TRUE TRUE 55 GTX NOAA Vertical Datum .GTX TRUE FALSE TRUE 58 HDF4Image HDF4 Dataset TRUE FALSE TRUE
The value in the
name column is the format parameter name used in the