This lesson is in the early stages of development (Alpha version)

# Crop raster data with rioxarray and geopandas

## Overview

Teaching: 40 min
Exercises: 20 min
Questions
• How can I crop my raster data to the area of interest?

Objectives
• Crop raster data with a bounding box.

• Crop raster data with a polygon.

• Crop raster data within a geometry buffer.

It is quite common that the raster data you have in hand is too large to process, or not all the pixels are relevant to your area of interest (AoI). In both situations, you should consider cropping your raster data before performing data analysis.

In this episode, we will introduce how to crop raster data into the desired area. We will use one Sentinel-2 image over Amsterdam as the example raster data, and introduce how to crop your data to different types of AoIs.

## Introduce the Data

We’ll continue from the results of the satellite image search that we have carried out in an exercise from a previous episode. We will load data starting from the `search.json` file.

If you would like to work with the data for this lesson without downloading data on-the-fly, you can download the raster data using this link. Save the `geospatial-python-raster-dataset.tar.gz` file in your current working directory, and extract the archive file by double-clicking on it or by running the following command in your terminal `tar -zxvf geospatial-python-raster-dataset.tar.gz`. Use the file `geospatial-python-raster-dataset/search.json` (instead of `search.json`) to get started with this lesson.

We also use the vector data that has already been introduced in episode Vector data in python.

## Crop raster data with a bounding box

We load a true color image using `pystac` and `rioxarray` and check the shape of the raster:

``````import pystac
import rioxarray

# Load image and inspect the shape
items = pystac.ItemCollection.from_file("search.json")
true_color_image = rioxarray.open_rasterio(items[1].assets["visual"].href) # Select a true color image
print(true_color_image.shape)
``````
``````(3, 10980, 10980)
``````

The large size of the raster data makes it time and memory consuming to visualise in its entirety. Instead, we can plot the “overview” asset, to investigate the coverage of the image.

``````# Get the overview asset
overview_image = rioxarray.open_rasterio(items[1].assets["overview"].href)
print(overview_image.shape)

# Visualize it
overview_image.plot.imshow(figsize=(8,8))
``````

As we can see, the overview image is much smaller compared to the original true color image. Therefore the visualization is much faster. If we are interested in the crop fields, then we would like to know where these are located in the image. To compare its coverage with the raster data, we first check the coordinate systems of both raster and vector data. For raster data, we use `pyproj.CRS`:

``````from pyproj import CRS

# Check the coordinate system
CRS(true_color_image.rio.crs)
``````
``````<Derived Projected CRS: EPSG:32631>
Name: WGS 84 / UTM zone 31N
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: UTM zone 31N
- method: Transverse Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
``````

To open and check the coordinate system of vector data, we use `geopandas`:

``````import geopandas as gpd

# Load the polygons of the crop fields

# Check the coordinate system
cf_boundary_crop.crs
``````
``````<Derived Projected CRS: EPSG:28992>
Name: Amersfoort / RD New
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: Netherlands - onshore, including Waddenzee, Dutch Wadden Islands and 12-mile offshore coastal zone.
- bounds: (3.2, 50.75, 7.22, 53.7)
Coordinate Operation:
- name: RD New
- method: Oblique Stereographic
Datum: Amersfoort
- Ellipsoid: Bessel 1841
- Prime Meridian: Greenwich
``````

As seen, the coordinate systems differ. To crop the raster using the shapefile, we first convert the coordinate system of `cf_boundary_crop` to the coordinate system of `true_color_image`, and then check the coverage:

``````from matplotlib import pyplot as plt

# Convert the coordinate system
cf_boundary_crop = cf_boundary_crop.to_crs(true_color_image.rio.crs)

# Plot
fig, ax = plt.subplots()
fig.set_size_inches((8,8))

# Plot image
overview_image.plot.imshow(ax=ax)

# Plot crop fields
cf_boundary_crop.plot(
ax=ax,
edgecolor="red",
)
``````

Seeing from the location of the polygons, the crop fields (red) only takes a small part of the raster. Therefore, before actual processing, we can first crop the raster to our area of interest. The `clip_box` function allows one to crop a raster by the min/max of the x and y coordinates. Note that we are cropping the original image `true_color_image` now, and not the overview image `overview_image`.

``````# Crop the raster with the bounding box
raster_clip_box = true_color_image.rio.clip_box(*cf_boundary_crop.total_bounds)
print(raster_clip_box.shape)
``````
``````(3, 1565, 1565)
``````

We successfully cropped the raster to a much smaller piece. We can visualize it now:

``````raster_clip_box.plot.imshow(figsize=(8,8))
``````

This cropped image can be saved for later usage:

``````raster_clip_box.rio.to_raster("raster_clip.tif")
``````

## Crop raster data with polygons

We have a cropped image around the fields. To further analyze the fields, one may want to crop the image to the exact field boundaries. This can be done with the `clip` function:

``````raster_clip_fields = raster_clip_box.rio.clip(cf_boundary_crop['geometry'])
``````

And we can visualize the results:

``````raster_clip_fields.plot.imshow(figsize=(8,8))
``````

We can save this image for later usage:

``````raster_clip_fields.rio.to_raster("crop_fields.tif")
``````

## Crop raster data with a geometry buffer

It is not always the case that the AoI comes in the format of polygon. Sometimes one would like to perform analysis around a (set of) point(s), or polyline(s). For example, in our AoI, there are also some groundwater monitoring wells available as point vector data. One may also want to perform analysis around these wells. The location of the wells is stored in `data/groundwater_monitoring_well`.

We can first load the wells vector data, and select wells within the coverage of the image:

``````# Load wells
wells = wells.to_crs(raster_clip_box.rio.crs)

# Crop the wells to the image extent
xmin, ymin, xmax, ymax = raster_clip_box.rio.bounds()
wells = wells.cx[xmin:xmax, ymin:ymax]
``````

Then we can check the location of the wells:

``````# Plot the wells over raster
fig, ax = plt.subplots()
fig.set_size_inches((8,8))
raster_clip_box.plot.imshow(ax=ax)
wells.plot(ax=ax, color='red', markersize=2)
``````

To select pixels around the geometries, one needs to first define a region including the geometries. This region is called a “buffer” and it is defined in the units of the projection. The size of the buffer depends on the analysis in your research. A buffer is also a polygon, which can be used to crop the raster data. `geopandas`’ objects have a `buffer` method to generate buffer polygons.

``````# Create 200m buffer around the wells
wells_buffer = wells.buffer(200)
``````

Now let’s see what do the buffers look like in the image:

``````# Visualize buffer on raster
fig, ax = plt.subplots()
fig.set_size_inches((8,8))
raster_clip_box.plot.imshow(ax=ax)
wells_buffer.plot(ax=ax, color='red')
``````

The red dots have grown larger indicating the conversion from points to buffer polygons.

## Exercise: Select the raster data around the wells

Now we have the buffer polygons around the groudwater monitoring wells, i.e. `wells_buffer`. Can you crop the image `raster_clip_box` to the buffer polygons? Can you visualize the results of cropping?

## Solution

``````# Crop
raster_clip_wells = raster_clip_box.rio.clip(wells_buffer)

# Visualize cropped buffer
raster_clip_wells.plot.imshow()
``````

## Exercise: Select the raster data around the waterways

In the previous episode we have corrected the waterway vector data and saved it in `waterways_nl_corrected.shp`. Can you select out all the raster data within 100m around the waterways, and visualize the results?

## Solution

``````# Load waterways polyline and convert CRS
waterways_nl = waterways_nl.to_crs(raster_clip_box.rio.crs)

# Crop the waterways to the image extent
waterways_nl = waterways_nl.cx[xmin:xmax, ymin:ymax]

# waterways buffer
waterways_nl_buffer = waterways_nl.buffer(100)

# Crop
raster_clip_waterways = raster_clip_box.rio.clip(waterways_nl_buffer)

# Visualize
raster_clip_waterways.plot.imshow(figsize=(8,8))
``````

## Crop raster data using another raster dataset

So far we have learned how to crop raster image with vector data. We can also crop a raster with another raster data. In this section, we will demonstrate how to crop the `true_color_image` image using the `crop_fields` image.

## Using `crop_fields` raster image

For this section, we will use the `crop_fields.tif` image that was produced in the section “Crop raster data with polygon”.

We read in the `crop_fields.tif` image. For the demonstration purpose, we will reproject it to the RD CRS system, so it will be in a different CRS from the `true_color_image`:

``````# Read crop_fields
crop_fields = rioxarray.open_rasterio("crop_fields.tif")

# Reproject to RD to make the CRS different from the "true_color_image"
crop_fields = crop_fields.rio.reproject("EPSG:28992")
CRS(crop_fields.rio.crs)
``````
``````<Derived Projected CRS: EPSG:28992>
Name: Amersfoort / RD New
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: unnamed
- method: Oblique Stereographic
Datum: Amersfoort
- Ellipsoid: Bessel 1841
- Prime Meridian: Greenwich
``````

And let’s check again the CRS of `true_color_image`:

``````# Get CRS of true_color_image
CRS(true_color_image.rio.crs)
``````
``````<Derived Projected CRS: EPSG:32631>
Name: WGS 84 / UTM zone 31N
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: UTM zone 31N
- method: Transverse Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
``````

Now the two images are in different coordinate systems. We can use `rioxarray.reproject_match()` function to crop `true_color_image` image. It will perform both the reprojection and the cropping operation. This might take a few minutes, because the `true_color_image` image is large.

``````# Crop and reproject
cropped_raster = true_color_image.rio.reproject_match(crop_fields)

# Visualize
cropped_raster.plot.imshow(figsize=(8,8))
``````

In this way, we accomplish the reproject and cropping in one go.

## Exercise

This time let’s do it the other way around by cropping the `crop_fields` image using the `true_color_image` image. Discuss the results.

## Solution

``````# Crop
cropped_raster = crop_fields.rio.reproject_match(true_color_image)

# Visualize
cropped_raster.plot.imshow(figsize=(8,8))
``````

In one line `reproject_match` does a lot of helpful things:

1. It reprojects.
2. It matches the extent using `nodata` values or by clipping the data.
3. It sets `nodata` values. This means we can run calculations on those two images.

## Code Tip

As we saw before, there also exists a method called `reproject()`, which only reprojects one raster to another projection. If you want more control over how rasters are resampled, clipped, and/or reprojected, you can use the `reproject()` method and other `rioxarray` methods individually.

## Key Points

• Use `clip_box` in `DataArray.rio` to crop a raster with a bounding box.

• Use `clip` in `DataArray.rio` to crop a raster with a given polygon.

• Use `buffer` in `geopandas` to make a buffer polygon of a (multi)point or a polyline. This polygon can be used to crop data.

• Use `reproject_match` function in `DataArray.rio` to reproject and crop a raster data using another raster data.