CS Weekly 5 - Simple Features in R

Last week I compared some of the differences between sf and sp. In this post, I'll focus on the basics of sf and the basic data and geometry types provided by sf.

Simple features referes to a formal standard (ISO 19125-1:2004) that describes how geographic objects in the real world can be represented in computers. The standard is widely implemented in many spatial and GIS software (ESRI, PostGIS).

For years, the standard spatial library in R was sp. It supported many of the spatial data types required for spatial analysis. However, it was created before the creation of the simple features standard. The sf library attempts to reconcile spatial data analysis in R with the simple feature standard. The long term goal is for sf to succeed sp.

Simple features

A feature is a thing located some where on Earth. This thing can be a tree, a stream, a road, a lake, an island, etc. Features can be 1 thing, like a tree, or many things, like all the islands of Hawaii. So a set of features can form a single object, like all the trees in a forest.

Lets provide a simple example, all the states in the United States. There are 50 states, if we were to put these states into a data.frame it would have 50 rows, each row is a state. Most states are a single polygon. For instance, Colorado only has a single shape (or geometry) associated with it. Hawaii on the other hand has several shapes associated with it, one for each island.

These shapes associated with each state are called geometries. Every other piece of data, population, area, etc., are called attributes. Together, the geometry and attributes create a spatial data set.

Geometries

All geometries are composed of points. In the simplest case, coordinates in 2 dimensions (3 and 4 dimensions are possible) specifying the X and Y location of the point. Below are the 7 most common simple feature geometries.

  • POINT - geometry containing a single point, zero-dimension
  • LINESTRING - a sequence of points connected by a straight line segments, one-dimension
  • POLYGON - a sequence of points that form a closed shape, two-dimension
  • MULTIPOINT - a set of points
  • MULTILINESTRING - a set of linestrings
  • MULTIPOLYGON - a set of polygons
  • GEOMETRYCOLLECTION - a set of geometries of any type

Creating Geometries

The basic structure of a geometry is called a simple feature geometry, or sfg in R. Creating any of these primitive types if fairly easy. Almost all the functions from sf are prefixed with st_.

library(sf)

# create sfg ----
## points ----
p1 <- st_point(c(0, 1))
p2 <- st_point(c(-1, 0))
p3 <- st_point(c(0.5, 0))
p4 <- st_point(c(1, 1))

## linestring ----
l1 <- st_linestring(matrix(c(-1, -1, -0.5, 1), ncol = 2))

## polygon ----
poly1 <- st_polygon(
  list(
    rbind(c(-1, -1), c(1, -1), c(1, 1), c(-1, -1))
  )
)

## plot these
plot(poly1, col = 'gray', border = 'blue')
plot(l1, col = 'green', lwd = 5, add = T)
plot(p1, col = 'orange', pch = 19, cex = 2, add = T)
plot(p2, col = 'purple', pch = 17, cex = 2, add = T)
plot(p3, col = 'red', pch = 15, cex = 2, add = T)
plot(p4, col = 'brown', pch = 3, cex = 2, lwd = 5, add = T)

Here we created simple geometries. The resulting plot should look like the following image.

example sf geometries

All of the base R plotting functionality is available when plotting sf objects.

Creating any of the MULTI* requires matrices, or lists of matrices for each elements in the feature set. For MULTIPOINT the data needs to be a matrix where each row is a point. For MULTILINESTRING or MULTIPOLYGON the data needs to be a list of matrices, each row in a matrix is point, for each geometry in the set. Check the documentation for more information about these geometry types. While these are common, I won't cover them here.

Sets of geometries

In the section above geometries were handled individually. It is typical to work with them as sets. sf calls these sets of geometries simple feature geometry column, sfc. To create one, combine individual geometries and a coordinate reference system.

sf_sfc <- st_sfc(p1, p2, p3, p4, crs = 4326)
sf_sfc

#> Geometry set for 4 features 
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -1 ymin: 0 xmax: 1 ymax: 1
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> POINT (0 1)
#> POINT (-1 0)
#> POINT (0.5 0)
#> POINT (1 1)

class(sf_sfc)
#> [1] "sfc_POINT" "sfc"

names(attributes(sf_sfc))
#> [1] "class"     "precision" "bbox"      "crs"       "n_empty"

The code block above created an sfc from the points in the first example. The printing the object shows it's contents, as well as several spatial attributes associated with it. Check the class and the names of the metadata of the sfc. We can plot this object, and like above, all the functionality of the plot function is available.

Thinking back to the US states example, all the states will comprise the sfc. Each state, a sfg, will be a row in the sfc. The geometry type for the sfg will be MULTIPOLYGON because some states can't be drawn as a single polygon (Michigan and Hawaii).

It is possible to combine different geometries into a single sfc. In these cases the geometry type is set to GEOMETRY.

Geometries and attributes

What about attribute data associated with each point. Think back to the forest example. Each tree is a point, which is stored as an simple feature geometry, sfg. The set of trees in the forest are combined into a simple feature geometry column, sfc. To combine attribute data, such as height and diameter, to the sfc we can use an sf object.

sf objects are tibbles or data.frames. If you are familiar with the tidyverse, think of them as tibbles, if not, think of them as data.frames. Creating these sf objects from tables with x and y coordinates are simple, use the st_as_sf function. However, let's extend the example we've been building up this entire article.

# create attribute data for the trees: height and diameter of the trunk
ht <- rnorm(4, 75, 10)
diam <- rnorm(4, 2.5, 1)

# create sf object, a tibble like structure with these data
tree_sf <- sf::st_sf(
  ## list of data, columns to bind together, one must be a geometry
  list(height = ht,
       diameter = diam,
       sf_sfc),
  crs = 4326)

tree_sf
#> Simple feature collection with 4 features and 2 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -1 ymin: 0 xmax: 1 ymax: 1
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>     height diameter           NA.
#> 1 75.06560 3.800921   POINT (0 1)
#> 2 66.62554 2.692367  POINT (-1 0)
#> 3 86.64016 1.536680 POINT (0.5 0)
#> 4 73.31864 3.474714   POINT (1 1)

plot(tree_sf, pch = 19, cex = 1.5)

The function st_sf first parameter is .... This should be a list of columns (equal length vectors to be coerced into a data.frame), one of which is a geometry column. Always add a CRS! Without a CRS we won't know where on earth the data are located.

The plot function works a little differently here. Instead of plotting just the points, it takes each attributes, scales the color based on the data for the attribute, and plots each attribute as a separate plot. Pretty neat default behavior.

sf default plot

Moving to sf

The sf library is maturing quickly. More and more packages are playing nicely, and when they don't, converting sf to sp objects is simple: as(sf, 'Spatial') (replace sf with your sf object). This function will take your sf object and converted to the proper sp object.

Below are two of the best resources I've encountered for spatial data analysis in R. Both use sf, and are free online!

Geocomputation with R - My introduction to sf came from this book.

Spatial Data Science - Written by Edzer Pebesma and Roger Bivand, two of the maintainers of sf and sp. A work in progress.

Finally, the website for sf is a great resource with many vignettes.