CS Weekly 4 - Comparing sf and sp

Simple features is an Open Geospatial Consortium (OGC) and International Organization for Standardization (ISO) standard that specifies storage and access models for spatial data. The R package sf implements simple features in R, and is viewed as a replacement for the sp package. These packages share many similar methods, work nicely with base R, and share contributors.

In this post I'll show some very simple differences between the packages.

Data

We'll work with a slightly contrived example, but it is the easiest way to get started. First, install sf, sp, RSQLite.

install.packages(c('sf', 'sp', 'RSQLite'))

library(sf)
library(sp)
library(RSQLite)

# get muese data
db <- system.file('sqlite/meuse.sqlite', package = 'sf')
dbcon <- dbConnect(dbDriver('SQLite'), db)
s <- st_read(dbcon, 'meuse.sqlite')

## create data frame of data
pts_data <- cbind(st_drop_geometry(s), st_coordinates(s))

This is a very contrived example. Normally we would read in a csv file with x and y coordinates. To save time, we will used a built in dataset, the meuse data. This is saved as a SQLite file, hence the need for RSQLite.

Create sp Objects

Creating a sp objects is relatively. Points are the easiest spatial data type to deal with. The code below will create a SpatialPointsDataFrame.

pts_sp <- sp::SpatialPointsDataFrame(
  data = pts_data,
  coords = pts_data[, c('x', 'y')],
  proj4string = CRS('+init=epsg:28992')
)

Here is a rundown of the parameters for the function above.

  • data: the data.frame to be turned into a sp object
  • coords: a matrix, or data type that can be coerced to a matrix, of the x and y coordinates
  • proj4string: a coordinate reference system, in this case, the EPSG ID as a PROJ4 string.

Structure

sp objects are S4 classes with many attributes. This can be checked with the mode(pts_sp) function call. In S4 classes, objects can be accessed with the @ sign.

str(pts_sp)
#> Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
#>   ..@ data       :'data.frame':	155 obs. of  15 variables:
#>   .. ..$ ogc_fid: int [1:155] 1 2 3 4 5 6 7 8 9 10 ...
#>   .. ..$ cadmium: num [1:155] 11.7 8.6 6.5 2.6 2.8 3 3.2 2.8 2.4 1.6 ...
#>   .. ..$ copper : num [1:155] 85 81 68 81 48 61 31 29 37 24 ...
#>   .. ..$ lead   : num [1:155] 299 277 199 116 117 137 132 150 133 80 ...
#>   .. ..$ zinc   : num [1:155] 1022 1141 640 257 269 ...
#>   .. ..$ elev   : num [1:155] 7.91 6.98 7.8 7.66 7.48 ...
#>   .. ..$ dist   : num [1:155] 0.00136 0.01222 0.10303 0.19009 0.27709 ...
#>   .. ..$ om     : num [1:155] 13.6 14 13 8 8.7 7.8 9.2 9.5 10.6 6.3 ...
#>   .. ..$ ffreq  : chr [1:155] "1" "1" "1" "1" ...
#>   .. ..$ soil   : chr [1:155] "1" "1" "1" "2" ...
#>   .. ..$ lime   : chr [1:155] "1" "1" "1" "0" ...
#>   .. ..$ landuse: chr [1:155] "Ah" "Ah" "Ah" "Ga" ...
#>   .. ..$ dist.m : num [1:155] 50 30 150 270 380 470 240 120 240 420 ...
#>   .. ..$ x      : num [1:155] 181072 181025 181165 181298 181307 ...
#>   .. ..$ y      : num [1:155] 333611 333558 333537 333484 333330 ...
#>   ..@ coords.nrs : num(0) 
#>   ..@ coords     : num [1:155, 1:2] 181072 181025 181165 181298 181307 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:155] "1" "2" "3" "4" ...
#>   .. .. ..$ : chr [1:2] "x" "y"
#>   ..@ bbox       : num [1:2, 1:2] 178605 329714 181390 333611
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:2] "x" "y"
#>   .. .. ..$ : chr [1:2] "min" "max"
#>   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
#>   .. .. ..@ projargs: chr "+init=epsg:28992"

mode(pts_sp)
#> [1] "S4"

## get proj4string attribute
pgs_sp@proj4string
#> CRS arguments: +init=epsg:28992

Plotting

The sp package extends base plotting in R. This means that you can use sp objects in base R's plot function. All the same plotting paramaters can be used. I'm only showing how the base plotting works here.

plot(sp)

sp-default plots

Subsetting

Subsetting based on data in the data attribute is very easy, and similar to how base R subsetting works on data.frames. You can see the results of subsetting by looking at the underlying data attribute, or visualizing the resulting object.

sub_pts_sp <- pts_sp[pts_sp$lead < 50, ]
plot(sub_pts_sp)

## the View() function opens a nice spreadsheet like view of the data in RStudio
View(sub_pts_sp)

Creating sf Objects

This is very similar to sp.

pts_sf <- sf::st_as_sf(
  x = pts_data,
  coords = c('x', 'y'),
  crs = 28992
)

And the rundown of the parameters for the function above

  • x: the data.frame to be turned into an sf object
  • coords: the name of the columns in the data.frame of x to be used as the coordinates
  • crs: the coordinate reference system

Structure

sf objects extend base R data.frames. The are their own class, but really, a list with some additional attributes. It looks a lot like a data.frame. The spatial data is stored in the a field called geometry. The geometry contains all the spatial information for the object.

str(pts_sf)
#> Classes ‘sf’ and 'data.frame':	155 obs. of  14 variables:
#>  $ ogc_fid : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ cadmium : num  11.7 8.6 6.5 2.6 2.8 3 3.2 2.8 2.4 1.6 ...
#>  $ copper  : num  85 81 68 81 48 61 31 29 37 24 ...
#>  $ lead    : num  299 277 199 116 117 137 132 150 133 80 ...
#>  $ zinc    : num  1022 1141 640 257 269 ...
#>  $ elev    : num  7.91 6.98 7.8 7.66 7.48 ...
#>  $ dist    : num  0.00136 0.01222 0.10303 0.19009 0.27709 ...
#>  $ om      : num  13.6 14 13 8 8.7 7.8 9.2 9.5 10.6 6.3 ...
#>  $ ffreq   : chr  "1" "1" "1" "1" ...
#>  $ soil    : chr  "1" "1" "1" "2" ...
#>  $ lime    : chr  "1" "1" "1" "0" ...
#>  $ landuse : chr  "Ah" "Ah" "Ah" "Ga" ...
#>  $ dist.m  : num  50 30 150 270 380 470 240 120 240 420 ...
#>  $ geometry:sfc_POINT of length 155; first list element:  'XY' num  181072 333611
#>  - attr(*, "sf_column")= chr "geometry"
#>  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
#>   ..- attr(*, "names")= chr  "ogc_fid" "cadmium" "copper" "lead" ...

mode(pts_sf)
#> [1] "list"

Plotting

Plotting in sf is very similar to sp. Just like sp, sf objects use the base R plotting function. However, it behaves a little differently. The default behavior of plotting sf objects is to plot every column of data, colored by values. This can be very handy, but can be time consuming. There are two different ways to plot just the geometry. I'll show them below.

plot(pts_sf)

## plot just geometry
plot(pts_sf[0])

## an alternative
plot(st_geometry(pts_sf))

sf default

Subsetting

Subsetting, again, works similarly to base R.

sub_pts_sf <- pts_sf[pts_sf$lead < 50, ]

## plot to see, remember to add the 0 to plot just the geometry
plot(sub_pts_sf[0])

sf objects can be subset with functions from the tidyverse. So the dplyr functions work here.

View(dplyr::filter(pts_sf, lead < 50))

Summary

I think it is safe to view sf as a complete replacement of sp. I have been using it for a year now, and find it much simpler to work with. There are a few pain points. For instance, when working with raster objects it is necessary to convert sf to sp objects. but this can be done within a funciton call, such as: raster::extract(raster_dat, as(sf_obj, 'Spatial)). Yes, a little more typing, but an easy solution.