Basic GIS operations with R and sf

Last updated on 2024-12-03 | Edit this page

Overview

Questions

  • How to perform basic GIS operations with the sf package?

Objectives

After completing this episode, participants should be able to…

  • Perform geoprocessing operations such as unions, joins and intersections with dedicated functions from the sf package
  • Compute the area of spatial polygons
  • Create buffers and centroids
  • Map and save the results

R

library(tidyverse)
library(sf)
library(osmdata)
library(leaflet)
library(lwgeom)
assign("has_internet_via_proxy", TRUE, environment(curl::has_internet))

Why sf for GIS?


As introduced in an earlier lesson, sf is a package which supports simple features (sf), “a standardized way to encode spatial vector data.”. It contains a large set of functions to achieve all the operations on vector spatial data for which you might use traditional GIS software: change the coordinate system, join layers, intersect or unite polygons, create buffers and centroids, etc. cf. the sf cheatsheet.

Conservation in Brielle, NL

Let’s focus on old buildings and imagine we’re in charge of their conservation. We want to know how much of the city would be affected by a non-construction zone of 100m around pre-1800 buildings.

Let’s select them and see where they are.

R

bb <- osmdata::getbb("Brielle, NL")
x <- opq(bbox = bb) %>%
   add_osm_feature(key = 'building') %>%
    osmdata_sf()
buildings <- x$osm_polygons %>%
  st_transform(.,crs=28992)


summary(buildings$start_date)

OUTPUT

   Length     Class      Mode
    10626 character character 

R

old <- 1800  # year prior to which you consider a building old

buildings$start_date <- as.numeric(buildings$start_date)

old_buildings <- buildings %>%
  filter(start_date <= old)

 ggplot(data = old_buildings) + 
   geom_sf(colour="red") +
   coord_sf(datum = st_crs(28992))

Overpass query unavailable without internet

If you encounter an error linked to your internet proxy (“Error: Overpass query unavailable without internet R”), run this line of code. It might not be needed, but ensures that your machine knows it has internet.

R

assign("has_internet_via_proxy", TRUE, environment(curl::has_internet))

As conservationists, we want to create a zone around historical buildings where building regulation will have special restrictions to preserve historical buildings.

Buffers


Let’s say the conservation zone should be 100 meters. In GIS terms, we want to create a buffer around polygons. The corresponding sf function is st_buffer(), with 2 arguments: the polygons around which to create buffers, and the radius of the buffer.

R

distance <- 100 # in meters 
 
#First, we check that the "old_buildings" layer projection is measured in meters:
st_crs(old_buildings)

OUTPUT

Coordinate Reference System:
  User input: EPSG:28992
  wkt:
PROJCRS["Amersfoort / RD New",
    BASEGEOGCRS["Amersfoort",
        DATUM["Amersfoort",
            ELLIPSOID["Bessel 1841",6377397.155,299.1528128,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4289]],
    CONVERSION["RD New",
        METHOD["Oblique Stereographic",
            ID["EPSG",9809]],
        PARAMETER["Latitude of natural origin",52.1561605555556,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",5.38763888888889,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9999079,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",155000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",463000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["Netherlands - onshore, including Waddenzee, Dutch Wadden Islands and 12-mile offshore coastal zone."],
        BBOX[50.75,3.2,53.7,7.22]],
    ID["EPSG",28992]]

R

#then we use `st_buffer()`
buffer_old_buildings <- 
  st_buffer(x = old_buildings, dist = distance)
 
ggplot(data = buffer_old_buildings) + 
  geom_sf() +   
  coord_sf(datum = st_crs(28992))

Union


Now, we have a lot of overlapping buffers. We would rather create a unique conservation zone rather than overlapping ones in that case. So we have to fuse the overlapping buffers into one polygon. This operation is called union and the corresponding function is st_union().

R

single_old_buffer <- st_union(buffer_old_buildings) %>%
  st_cast(to = "POLYGON") %>%
  st_as_sf() 

single_old_buffer<- single_old_buffer %>%
  mutate("ID"=as.factor(1:nrow(single_old_buffer))) %>%
  st_transform(.,crs=28992) 

We also use st_cast() to explicit the type of the resulting object (POLYGON instead of the default MULTIPOLYGON) and st_as_sf() to transform the polygon into an sf object. With this function, we ensure that we end up with an sf object, which was not the case after we forced the union of old buildings into a POLYGON format.

We create unique IDs to identify the new polygons.

Centroids


For the sake of visualisation speed, we would like to represent buildings by a single point (for instance: their geometric centre) rather than their actual footprint. This operation means defining their centroid and the corresponding function is st_centroid().

R

sf::sf_use_s2(FALSE)  # s2 works with geographic projections, so to calculate centroids in projected CRS units (meters), we need to disable it.

centroids_old <- st_centroid(old_buildings) %>%
  st_transform(.,crs=28992)  

ggplot() + 
    geom_sf(data = single_old_buffer, aes(fill=ID)) +
    geom_sf(data = centroids_old) +
    coord_sf(datum = st_crs(28992))

Intersection


Now, we would like to distinguish conservation areas based on the number of historic buildings they contain. In GIS terms, we would like to know how many centroids each fused buffer polygon contains. This operation means intersecting the layer of polygons with the layer of points and the corresponding function is st_intersection().

R

 centroids_buffers <- 
  st_intersection(centroids_old,single_old_buffer) %>%
  mutate(n = 1)

 centroid_by_buffer <- centroids_buffers %>%
   group_by(ID) %>%
   summarise(n = sum(n))
 
 single_buffer <- single_old_buffer %>%
   mutate(n_buildings = centroid_by_buffer$n)

 
  ggplot() + 
   geom_sf(data = single_buffer, aes(fill = n_buildings)) +
   scale_fill_viridis_c(alpha = 0.8,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") +
      coord_sf(datum = st_crs(28992))

st_intersection here adds the attributes of the intersected polygon buffers to the data table of the centroids. This means we will now know about each centroid, the ID of its intersected polygon-buffer, and a variable called “n” which is population with 1 for everyone. This means that all centroids will have the same weight when aggregated.

We aggregate them by ID number (group_by(ID)) and sum the variable n to know how many centroids are contained in each polygon-buffer.

Final output:

Let’s map this layer over the initial map of individual buildings, and save the result.

R

p <- ggplot() + 
   geom_sf(data = buildings) +
   geom_sf(data = single_buffer, aes(fill=n_buildings), colour = NA) +
   scale_fill_viridis_c(alpha = 0.6,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") +
    coord_sf(datum = st_crs(28992))

 p 

R

ggsave(filename = "fig/ConservationBrielle.png", 
       plot = p)

Challenge: Conservation rules have changed.

The historical threshold now applies to all pre-war buildings, but the distance to these building is reduced to 10m. Can you map the number of all buildings per 10m fused buffer?

R

old <- 1939 
distance <- 10

# select
old_buildings <- buildings %>%
  filter(start_date <= old)

# buffer
buffer_old_buildings <- st_buffer(old_buildings, dist = distance)
  
# union
single_old_buffer <- st_union(buffer_old_buildings) %>%
  st_cast(to = "POLYGON") %>%
  st_as_sf()  
 
single_old_buffer <- single_old_buffer %>%
  mutate("ID"=1:nrow(single_old_buffer))  %>%
  st_transform(single_old_buffer,crs=4326) 

# centroids
centroids_old <- st_centroid(old_buildings) %>%
  st_transform(.,crs=4326)  
  
# intersection
centroids_buffers <- st_intersection(centroids_old,single_old_buffer) %>%
  mutate(n=1)
 
centroid_by_buffer <- centroids_buffers %>% 
  group_by(ID) %>%
  summarise(n = sum(n))
  
single_buffer <- single_old_buffer %>% 
  mutate(n_buildings = centroid_by_buffer$n)
 
pnew <- ggplot() + 
    geom_sf(data = buildings) +
    geom_sf(data = single_buffer, aes(fill = n_buildings), colour = NA) +
    scale_fill_viridis_c(alpha = 0.6,
                         begin = 0.6,
                         end = 1,
                         direction = -1,
                         option = "B")  +
    coord_sf(datum = st_crs(28992))
  
  pnew 

R

ggsave(filename = "fig/ConservationBrielle_newrules.png", 
       plot = pnew)

Problem: there are many pre-war buildings and the buffers are large so the number of old buildings is not very meaningful. Let’s compute the density of old buildings per buffer zone.

Area


R

single_buffer$area <- sf::st_area(single_buffer)  %>% 
  units::set_units(., km^2)

single_buffer$old_buildings_per_km2 <- as.numeric(single_buffer$n_buildings / single_buffer$area)

 ggplot() + 
   geom_sf(data = buildings) +
   geom_sf(data = single_buffer, aes(fill=old_buildings_per_km2), colour = NA) +
   scale_fill_viridis_c(alpha = 0.6,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") 

Key Points

  • Use the st_* functions from sf for basic GIS operations
  • Perform unions, joins and intersection operations
  • Compute the area of spatial polygons with st_area()