Explore and Plot by Shapefile Attributes
OverviewTeaching: 40 min
Exercises: 20 minQuestions
How can I compute on the attributes of a spatial object?Objectives
Query attributes of a spatial object.
Subset spatial objects using specific attribute values.
Plot a shapefile, colored by unique attribute values.
# learners will have this data loaded from previous episodes point_HARV = gpd.read_file("data/NEON-DS-Site-Layout-Files/HARV/HARVtower_UTM18N.shp") lines_HARV = gpd.read_file("data/NEON-DS-Site-Layout-Files/HARV/HARV_roads.shp") aoi_boundary_HARV <- gpd.read_file( "data/NEON-DS-Site-Layout-Files/HARV/HarClip_UTMZ18.shp")
Things You’ll Need To Complete This Episode
See the lesson homepage for detailed information about the software, data, and other prerequisites you will need to work through the examples in this episode.
This episode continues our discussion of shapefile attributes and covers how to work with shapefile attributes in Python. It covers how to identify and query shapefile attributes, as well as how to subset shapefiles by specific attribute values. Finally, we will learn how to plot a shapefile according to a set of attribute values.
Load the Data
We will continue using the
matplotlib.pyplot packages in this episode. Make sure that you have these packages loaded. We will
continue to work with the three shapefiles that we loaded in the
Open and Plot Shapefiles in R episode.
Query Shapefile Metadata
As we discussed in the
Open and Plot Shapefiles in R episode,
we can view metadata associated with a
.type- The type of vector data stored in the object.
len- The number of features in the object
.bounds- The spatial extent (geographic area covered by) of the object.
.crs- The CRS (spatial projection) of the data.
We started to explore our
point_HARV object in the previous episode.
We can view the object with
point_HARV or print a summary of the object itself to the console.
We view the columns in
.columns to count the number of attributes associated with a spatial object too. Note that the geometry is just another column and counts towards the total.
Challenge: Attributes for Different Spatial Classes
Explore the attributes associated with the
- How many attributes does each have?
- Who owns the site in the
Which of the following is NOT an attribute of the
A) Latitude B) County C) Country
1) To find the number of attributes, we use the
2) Ownership information is in a column named
3) To see a list of all of the attributes, we can use the
“Country” is not an attribute of this object.
Explore Values within One Attribute
We can explore individual values stored within a particular attribute.
Comparing attributes to a spreadsheet or a data frame, this is similar
to exploring values in a column. We did this with the
gapminder dataframe in an earlier lesson. For
GeoDataFrames, we can use the same syntax:
We can see the contents of the
TYPE field of our lines shapefile:
To see only unique values within the
TYPE field, we can use the
np.unique() function for extracting the possible values of a
categorical (or numerical) variable.
We can use the
filter() function from
dplyr that we worked with in an earlier lesson to select a subset of features
from a spatial object in Python, just like with data frames.
For example, we might be interested only in features that are of
TYPE “footpath”. Once we subset out this data, we can use it as input to other code so that code only operates on the footpath lines.
footpath_HARV = lines_HARV[lines_HARV.TYPE == "footpath"] len(footpath_HARV)
Our subsetting operation reduces the
features count to 2. This means
that only two feature lines in our spatial object have the attribute
TYPE == footpath. We can plot only the footpath lines:
There are two features in our footpaths subset. Why does the plot look like
there is only one feature? Let’s adjust the colors used in our plot. If we have
2 features in our vector object, we can plot each using a unique color by
assigning a color map, or
cmap to each geometry/row in our
We can also alter the default line thickness by using the
size = parameter,
as the default value can be hard to see.
Now, we see that there are in fact two features in our plot!
Challenge: Subset Spatial Line Objects Part 1
Subset out all
woods roadfrom the lines layer and plot it. There are many more color maps to use, so if you’d like, do a web search to find a matplotlib
cmapthat works better for this plot than
First we will save an object with only the boardwalk lines:
woods_road_HARV = lines_HARV[lines_HARV.TYPE == "woods_road_HARV"]
Let’s check how many features there are in this subset:
Now let’s plot that data:
Adjust Line Width
We adjusted line color by applying an arbitrary color map earlier. If we want a unique line color for each attribute category
GeoDataFrame, we can use the following argument,
column, as well as some style arguments to improv ethe visuals.
We already know that we have four different
TYPE levels in the lines_HARV object, so we will set four different line colors.
import matplotlib.pyplot as plt plt.style.use("ggplot") lines_HARV.plot(column="TYPE", linewidth=3, legend=True, figsize=(16,10))
Our map is starting together, in the next lesson we will add our Canopy Height Model that we calculated in an earlier episode.
Challenge: Plot Polygon by Attribute
- Create a map of the state boundaries in the United States using the data located in your downloaded data folder:
NEON-DS-Site-Layout-Files/US-Boundary-Layers\US-State-Boundaries-Census-2014. Apply a fill color to each state using its
regionvalue. Add a legend.
First we read in the data and check how many levels there are in the
state_boundary_US = gpd.read_file("data/NEON-DS-Site-Layout-Files/US-Boundary-Layers/US-State-Boundaries-Census-2014.shp") np.unique(state_boundary_US.region)
Now we can create our plot:
state_boundary_US.plot(column = "region", linewidth = 2, legend = True, figsize=(20,5))
geopandasis similar to standard
pandasdata frames and can be manipulated using the same functions.
Almost any feature of a plot can be customized using the various functions and options in the