Introduction to Raster Data


  • Raster data is pixelated data where each pixel is associated with a specific location.
  • Raster data always has an extent and a resolution.
  • The extent is the geographical area covered by a raster.
  • The resolution is the area covered by each pixel of a raster.

Introduction to Vector Data


  • Vector data structures represent specific features on the Earth’s surface along with attributes of those features.
  • Vector objects are either points, lines, or polygons.

Coordinate Reference Systems


  • All geospatial datasets (raster and vector) are associated with a specific coordinate reference system.
  • A coordinate reference system includes datum, projection, and additional parameters specific to the dataset.

The Geospatial Landscape


  • Many software packages exist for working with geospatial data.
  • Command-line programs allow you to automate and reproduce your work.
  • JupyterLab provides a user-friendly interface for working with Python.

Access satellite imagery using Python


  • Accessing satellite images via the providers’ API enables a more reliable and scalable data retrieval.
  • STAC catalogs can be browsed and searched using the same tools and scripts.
  • rioxarray allows you to open and download remote raster files.

Read and visualize raster data


  • rioxarray and xarray are for working with multidimensional arrays like pandas is for working with tabular data.
  • rioxarray stores CRS information as a CRS object that can be converted to an EPSG code or PROJ4 string.
  • Missing raster data are filled with nodata values, which should be handled with care for statistics and visualization.

Vector data in Python


  • Load spatial objects into Python with geopandas.read_file() function.
  • Spatial objects can be plotted directly with GeoDataFrame’s .plot() method.
  • Crop spatial objects with .cx[] indexer.
  • Convert CRS of spatial objects with .to_crs().
  • Select spatial features with .clip().
  • Create a buffer of spatial objects with .buffer().
  • Merge overlapping spatial objects with .dissolve().
  • Join spatial features spatially with .sjoin().

Crop raster data with rioxarray and geopandas


  • Use clip_box to crop a raster with a bounding box.
  • Use clip to crop a raster with a given polygon.
  • Use reproject_match to match two raster datasets.

Raster Calculations in Python


  • Python’s built-in math operators are fast and simple options for raster math.
  • numpy.digitize can be used to classify raster values in order to generate a less complicated map.

Calculating Zonal Statistics on RastersIntroductionMaking vector and raster data compatibleRasterizing the vector dataCalculate zonal statistics


  • Zones can be extracted by attribute columns of a vector dataset
  • Zones can be rasterized using rasterio.features.rasterize
  • Calculate zonal statistics with xrspatial.zonal_stats over the rasterized zones.

Parallel raster computations using Dask


  • The %%time Jupyter magic command can be used to profile calculations.
  • Data ‘chunks’ are the unit of parallelization in raster calculations.
  • (rio)xarray can open raster files as chunked arrays.
  • The chunk shape and size can significantly affect the calculation performance.
  • Cloud-optimized GeoTIFFs have an internal structure that enables performant parallel read.