Summary and Schedule
Introduction to Geospatial Raster and Vector Data with Python
In this lesson you will learn how to work with geospatial datasets and how to process these with Python. Python is one of the most popular programming languages for data science and analytics, with a large and steadily growing community in the field of Earth and Space Sciences. The lesson is meant for participants with a basic knowledge of Python and it allows them to familiarize with the world of geospatial raster and vector data. If you are unfamiliar with Python, useful resources to get started include the Software Carpentry’s lesson “Programming with Python” and the book “Think Python” by Allen Downey. In the Introduction to Geospatial Raster and Vector Data with Python lesson you will be introduced to a set of tools from the Python ecosystem and learn how these can be used to carry out geospatial data analysis tasks. In particular, you will work with satellite images and open topographical geo-datasets, and learn how these spatial datasets can be accessed, explored, manipulated and visualized using Python.
Case study - Wildfires
As a case study for this lesson we will focus on wildfires. According to the IPCC assessment report, the wildfire seasons are lengthening as a result of changes in temperature and increasing drought conditions. To analyse the impact of these wildfires, we will focus on the wildfire that occurred on the Greek island of Rhodes in the summer of 2023, which had a devastating effect and led to the evacuation of 19.000 people. In this lesson we are going to analyse the effect of this disaster by estimating which built-up areas were affected by these wildfires. Furthermore, we will analyse which vegetation and land-use types have been affected the most by the wildfire in order to get an understanding of which areas are more vulnerable to wildfires. The analysis that we set up provides insights in the effect of the wildfire and generates input for wildfire mitigation strategies.
Note, that the analyses presented in this lesson are developed for educational purposes. Therefore in some occasions the analysis steps have been simplified and assumptions have been made.
The data used in this lesson includes optical satellite images from the Copernicus Sentinel-2 mission and topographical data from OpenStreetMap (OSM). These datasets are real-world open data sets that entail sufficient complexity to teach many aspects of data analysis and management. The datasets have been selected to allow participants to focus on the core ideas and skills being taught while offering the chance to encounter common challenges with geospatial data. Furthermore, we have selected datasets which are available anywhere on Earth.
During this lesson we will setup an analysis pipeline which identifies scorched areas based on bands of satellite images collected after the disaster in July 2023. Next, we will calculate the Normalized Difference Vegetation Index (NDVI) to assess the vegetation cover of the areas before and after the wildfire. To investigate the affected built-up areas and main roads, we will use OSM vector data and compare them with the previously identified scorched areas.
To most effectively use this material, make sure to download the data and follow the software setup instructions before working through the lesson (this especially accounts for learners that follow this lesson in a workshop).
| Setup Instructions | Download files required for the lesson | |
| Duration: 00h 00m | 1. Introduction to Raster Data |
What format should I use to represent my data? What are the main data types used for representing geospatial data? What are the main attributes of raster data? ::: |
| Duration: 00h 20m | 2. Introduction to Vector Data |
What are the main attributes of vector data? ::: |
| Duration: 00h 35m | 3. Coordinate Reference Systems |
What is a coordinate reference system and how do I interpret
one? ::: |
| Duration: 01h 00m | 4. The Geospatial Landscape |
What programs and applications are available for working with geospatial
data? ::: |
| Duration: 01h 10m | 5. Access satellite imagery using Python |
Where can I find open-access satellite data? How do I search for satellite imagery with the STAC API? How do I fetch remote raster datasets using Python? ::: |
| Duration: 01h 55m | 6. Read and visualize raster data |
How is a raster represented by rioxarray? How do I read and plot raster data in Python? How can I handle missing data? ::: |
| Duration: 03h 35m | 7. Vector data in Python |
How can I read, inspect, and process spatial objects, such as points,
lines, and polygons? ::: |
| Duration: 04h 25m | 8. Crop raster data with rioxarray and geopandas |
How can I crop my raster data to the area of interest? ::: |
| Duration: 05h 05m | 9. Raster Calculations in Python |
How do I perform calculations on rasters and extract pixel values for
defined locations? ::: |
| Duration: 06h 20m | 10. Calculating Zonal Statistics on Rasters |
How to compute raster statistics on different zones delineated by vector
data? ::: |
| Duration: 07h 00m | 11. Parallel raster computations using Dask |
How can I parallelize computations on rasters with Dask? How can I determine if parallelization improves calculation speed? What are good practices in applying parallelization to my raster calculations? ::: |
| Duration: 07h 55m | 12. Data cubes with ODC-STAC |
Can I mosaic tiled raster datasets when my area of interest spans
multiple files? Can I stack raster datasets that cover the same area along the time dimension in order to explore temporal changes of some quantities? ::: |
| Duration: 08h 40m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Data Sets
- Create a new directory on your Desktop called
geospatial-python. - Within
geospatial-python, create a directory calleddata. - Download the data required for this lesson via this link (678MB).
- Unzip the downloaded file and save its content into the just created
datadirectory.
Now you should have the following files in the data
directory:
-
sentinel-2- This is a directory containing multiple bands of Sentinel-2 raster images collected over the island of Rhodes on Aug 27, 2023. -
dem/rhodes_dem.tif- This is the Digital Elevation Model (DEM) of the island of Rhodes, retrieved from the Copernicus Digital Elevation Model (GLO-30). The original tiles have been cropped and mosaicked for this lesson. -
gadm/ADM_ADM_3.gpkg- This is the administration boundaries of Rhodes, downloaded from GADM and modified for this lesson. -
osm/osm_landuse.gpkgandosm/osm_roads.gpkg- They are land-use poylgons and roads polylines of Rhodes, downloaded from Openstreetmaps via Geofabrik and modified for this lesson.
Software Setup
Python is a popular language for
scientific computing, and great for general-purpose programming as well.
There are many ways to install Python and the required dependencies. In
this workshop, we suggest to use uv for its fast and
easy installation process.
Software Setup using uv
Please follow the instructions below according to your operating system.
Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.12 is fine). Also, please set up your python environment at least a day in advance of the workshop. If you encounter problems with the installation procedure, ask your workshop organizers via e-mail for assistance so you are ready to go as soon as the workshop begins.
Open a terminal and install uv following the official
installation instructions:
Then make sure you are inside the geospatial-python
directory you created during the data setup step by doing:
Finally, run the following command to create a virtual environment and install the required dependencies:
On Windows, first we install uv using PowerShell
following the official
installation instructions:
After the installation, you may see suggestions on the PowerShell
terminal like
$env:Path = "C:\Users\username\.local\bin;$env:Path" This
means you need to manually add the uv executable to your
system’s PATH variable. Please run the suggested command in your
PowerShell terminal to add uv to your PATH. Otherwise
PowerShell will not recognize the uv command in the next
step.
Then make sure you are inside the geospatial-python
directory you created during the data setup step by doing:
And replace the <Username> pattern (including the
angle brackets <>) with your Windows username.
Finally, run the following command to create a virtual environment and
install the required dependencies:
After the installation, a .venv directory will be
created in the current directory, which contains the virtual environment
with all the required dependencies.
Testing the installation
In order to follow the lesson, you should launch JupyterLab. Let’s
try it now to make sure everything is set up correctly. You should run
the following command in your terminal from the
geospatial-python directory:
uv run jupyter lab
Once you have launched JupyterLab, create a new Python 3 notebook, type the following code snippet in a cell and press the “Play” button:
If all the steps above completed successfully you are ready to follow along with the lesson!
Alternative: software setup using Anaconda
If you prefer to use Anaconda, you can follow the alternative setup instructions on this page.