CS Weekly 5 - Simple Features in R
Last week I compared some of the differences between
sp. In this post, I'll focus on the basics of
sf and the basic data and geometry types provided by
Simple features referes to a formal standard (ISO 19125-1:2004) that describes how geographic objects in the real world can be represented in computers. The standard is widely implemented in many spatial and GIS software (ESRI, PostGIS).
For years, the standard spatial library in R was
sp. It supported many of the spatial data types required for spatial analysis. However, it was created before the creation of the simple features standard. The
sf library attempts to reconcile spatial data analysis in R with the simple feature standard. The long term goal is for
sf to succeed
A feature is a thing located some where on Earth. This thing can be a tree, a stream, a road, a lake, an island, etc. Features can be 1 thing, like a tree, or many things, like all the islands of Hawaii. So a set of features can form a single object, like all the trees in a forest.
Lets provide a simple example, all the states in the United States. There are 50 states, if we were to put these states into a
data.frame it would have 50 rows, each row is a state. Most states are a single polygon. For instance, Colorado only has a single shape (or geometry) associated with it. Hawaii on the other hand has several shapes associated with it, one for each island.
These shapes associated with each state are called geometries. Every other piece of data, population, area, etc., are called attributes. Together, the geometry and attributes create a spatial data set.
All geometries are composed of points. In the simplest case, coordinates in 2 dimensions (3 and 4 dimensions are possible) specifying the X and Y location of the point. Below are the 7 most common simple feature geometries.
POINT- geometry containing a single point, zero-dimension
LINESTRING- a sequence of points connected by a straight line segments, one-dimension
POLYGON- a sequence of points that form a closed shape, two-dimension
MULTIPOINT- a set of points
MULTILINESTRING- a set of linestrings
MULTIPOLYGON- a set of polygons
GEOMETRYCOLLECTION- a set of geometries of any type
The basic structure of a geometry is called a simple feature geometry, or
sfg in R. Creating any of these primitive types if fairly easy. Almost all the functions from
sf are prefixed with
library(sf) # create sfg ---- ## points ---- p1 <- st_point(c(0, 1)) p2 <- st_point(c(-1, 0)) p3 <- st_point(c(0.5, 0)) p4 <- st_point(c(1, 1)) ## linestring ---- l1 <- st_linestring(matrix(c(-1, -1, -0.5, 1), ncol = 2)) ## polygon ---- poly1 <- st_polygon( list( rbind(c(-1, -1), c(1, -1), c(1, 1), c(-1, -1)) ) ) ## plot these plot(poly1, col = 'gray', border = 'blue') plot(l1, col = 'green', lwd = 5, add = T) plot(p1, col = 'orange', pch = 19, cex = 2, add = T) plot(p2, col = 'purple', pch = 17, cex = 2, add = T) plot(p3, col = 'red', pch = 15, cex = 2, add = T) plot(p4, col = 'brown', pch = 3, cex = 2, lwd = 5, add = T)
Here we created simple geometries. The resulting plot should look like the following image.
All of the base R plotting functionality is available when plotting
Creating any of the
MULTI* requires matrices, or lists of matrices for each elements in the feature set. For
MULTIPOINT the data needs to be a matrix where each row is a point. For
MULTIPOLYGON the data needs to be a list of matrices, each row in a matrix is point, for each geometry in the set. Check the documentation for more information about these geometry types. While these are common, I won't cover them here.
Sets of geometries
In the section above geometries were handled individually. It is typical to work with them as sets.
sf calls these sets of geometries simple feature geometry column,
sfc. To create one, combine individual geometries and a coordinate reference system.
sf_sfc <- st_sfc(p1, p2, p3, p4, crs = 4326) sf_sfc #> Geometry set for 4 features #> geometry type: POINT #> dimension: XY #> bbox: xmin: -1 ymin: 0 xmax: 1 ymax: 1 #> epsg (SRID): 4326 #> proj4string: +proj=longlat +datum=WGS84 +no_defs #> POINT (0 1) #> POINT (-1 0) #> POINT (0.5 0) #> POINT (1 1) class(sf_sfc) #>  "sfc_POINT" "sfc" names(attributes(sf_sfc)) #>  "class" "precision" "bbox" "crs" "n_empty"
The code block above created an
sfc from the points in the first example. The printing the object shows it's contents, as well as several spatial attributes associated with it. Check the class and the names of the metadata of the
sfc. We can plot this object, and like above, all the functionality of the plot function is available.
Thinking back to the US states example, all the states will comprise the
sfc. Each state, a
sfg, will be a row in the
sfc. The geometry type for the
sfg will be
MULTIPOLYGON because some states can't be drawn as a single polygon (Michigan and Hawaii).
It is possible to combine different geometries into a single
sfc. In these cases the geometry type is set to
Geometries and attributes
What about attribute data associated with each point. Think back to the forest example. Each tree is a point, which is stored as an simple feature geometry,
sfg. The set of trees in the forest are combined into a simple feature geometry column,
sfc. To combine attribute data, such as height and diameter, to the
sfc we can use an
sf objects are
data.frames. If you are familiar with the tidyverse, think of them as
tibbles, if not, think of them as
data.frames. Creating these
sf objects from tables with x and y coordinates are simple, use the
st_as_sf function. However, let's extend the example we've been building up this entire article.
# create attribute data for the trees: height and diameter of the trunk ht <- rnorm(4, 75, 10) diam <- rnorm(4, 2.5, 1) # create sf object, a tibble like structure with these data tree_sf <- sf::st_sf( ## list of data, columns to bind together, one must be a geometry list(height = ht, diameter = diam, sf_sfc), crs = 4326) tree_sf #> Simple feature collection with 4 features and 2 fields #> geometry type: POINT #> dimension: XY #> bbox: xmin: -1 ymin: 0 xmax: 1 ymax: 1 #> epsg (SRID): 4326 #> proj4string: +proj=longlat +datum=WGS84 +no_defs #> height diameter NA. #> 1 75.06560 3.800921 POINT (0 1) #> 2 66.62554 2.692367 POINT (-1 0) #> 3 86.64016 1.536680 POINT (0.5 0) #> 4 73.31864 3.474714 POINT (1 1) plot(tree_sf, pch = 19, cex = 1.5)
st_sf first parameter is
.... This should be a list of columns (equal length vectors to be coerced into a
data.frame), one of which is a geometry column. Always add a CRS! Without a CRS we won't know where on earth the data are located.
The plot function works a little differently here. Instead of plotting just the points, it takes each attributes, scales the color based on the data for the attribute, and plots each attribute as a separate plot. Pretty neat default behavior.
sf library is maturing quickly. More and more packages are playing nicely, and when they don't, converting
sp objects is simple:
as(sf, 'Spatial') (replace sf with your sf object). This function will take your
sf object and converted to the proper
Below are two of the best resources I've encountered for spatial data analysis in R. Both use
sf, and are free online!
Geocomputation with R - My introduction to
sf came from this book.
Spatial Data Science - Written by Edzer Pebesma and Roger Bivand, two of the maintainers of
sp. A work in progress.
Finally, the website for
sf is a great resource with many vignettes.