This lesson is in the early stages of development (Alpha version)

Introduction to Geospatial Raster and Vector Data with Python

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain.

Getting Started

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow. To most effectively use these materials, please make sure to download the data and install everything before working through this lesson.

This workshop assumes no prior experience with the tools covered in the workshop. However, learners with prior experience working with geospatial data may be able to skip episodes 1-4, which focus on geospatial concepts and tools. Similarly, learners who have prior experience with the Python programming language may wish to skip the Plotting and Programming in Python lesson.

To get started, follow the directions in the Setup tab to get access to the required software and data for this workshop.


The data and lessons in this workshop were originally developed through a hackathon funded by the National Ecological Observatory Network (NEON) - an NSF funded observatory in Boulder, Colorado - in collaboration with Data Carpentry, SESYNC and CYVERSE. NEON is collecting data for 30 years to help scientists understand how aquatic and terrestrial ecosystems are changing. The data used in these lessons cover two NEON field sites:

You can download all of the data used in this workshop by clicking this download link. Clicking the download link will download all of the files as a single compressed (.zip) file. To expand this file, double-click the folder icon in your file navigator application (for Macs, this is the Finder application).

These data files represent teaching version of the data, with sufficient complexity to teach many aspects of data analysis and management, but with many complexities removed to allow students to focus on the core ideas and skills being taught.

Dataset File name Description
Site layout shapefiles A set of shapefiles for the NEON’s Harvard Forest field site and (some) state boundary layers.
Airborne remote sensing data LiDAR data collected by the NEON Airborne Observation Platform (AOP) and processed at NEON including a canopy height model, digital elevation model and digital surface model for NEON’s Harvard Forest and San Joaquin Experimental Range field sites.

Workshop Overview

Lesson Starting Points Overview
Episode 1: Introduction to Raster Data Understand data structures and common storage and transfer formats for spatial data. Start here if you want to understand fundamental geospatial concepts like coordinate reference systems, rasters, and vectors.
Plotting and Programming in Python Import data into Python, calculate summary statistics, and create publication-quality graphics. Start here if you have an understanding of geospatial concepts but want to learn Python fundamentals.
Episode 5: Intro to Raster Data in Python Open, work with, and plot vector and raster-format spatial data in Python. Start here if you already have a good grasp of geospatial concepts and a working knowledge of Python.


Setup Download files required for the lesson
00:00 1. Introduction to Raster Data What format should I use to represent my data?
What are the main data types used for representing geospatial data?
What are the main attributes of raster data?
00:25 2. Introduction to Vector Data What are the main attributes of vector data?
00:40 3. Coordinate Reference Systems What is a coordinate reference system and how do I interpret one?
01:05 4. The Geospatial Landscape What programs and applications are available for working with geospatial data?
01:15 5. Intro to Raster Data in Python What is a raster dataset?
How do I work with and plot raster data in Python?
How can I handle missing or bad data values for a raster?
02:15 6. Reproject Raster Data with Rioxarray How do I call a DataArray to print out its metadata information?
How do I work with raster data sets that are in different projections?
03:35 7. Raster Calculations in Python How do I subtract one raster from another and extract pixel values for defined locations?
04:35 8. Open and Plot Shapefiles in Python How can I distinguish between and visualize point, line and polygon vector data?
05:05 9. Plot Multiple Shapefiles with Geopandas FIXME How can I create map compositions with custom legends using geopandas?
How can I plot raster and vector data together?
06:05 10. Convert from .csv to a Shapefile in Python How can I import CSV files as shapefiles in Python?
07:05 11. Calculating Zonal Statistics on Rasters
08:05 12. Intro to Raster Data in Python FIXME What do I do when vector data don’t line up?
09:05 13. Manipulate Raster Data in Python FIXME How can I crop raster objects to vector objects, and extract the summary of raster pixels?
10:05 14. Work With Multi-Band Rasters in Python How can I visualize individual and multiple bands in a raster object?
11:05 15. Derive Values from Raster Time Series FIXME How can I calculate, extract, and export summarized raster pixel data?
12:05 16. Raster Time Series Data in Python FIXME How can I view and and plot data for different times of the year?
13:05 17. Explore and Plot by Shapefile Attributes How can I compute on the attributes of a spatial object?
14:05 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.