Program Flags
Last updated on 2025-04-28 | Edit this page
Overview
Questions
- How can I make an easy shortcut to analyze all files at once using a program flag?
Objectives
- Handle flags and files separately in a command-line program.
Handling Program Flags
Now we have a program which is capable of handling any number of data sets at once.
But what if we have 50 GDP data sets? It would be awfully tedious to type in the names of 50 files in the command line, so let’s add a flag to our program indicating that we would like it to generate a plot for each data set in the current directory.
Flags are a convention used in programming to indicate to a program that a non-default behavior is being requested by the user. In this case, we’ll be using a “-a” flag to indicate to our program we would like it to operate on all data sets in our directory.
To explore what files are in the current directory, we’ll be using
the Python’s glob
module.
- In Unix, the term “globbing” means “matching a set of files with a pattern”.
- The most common patterns are:
-
*
meaning “match zero or more characters” -
?
meaning “match exactly one character”
-
- Python contains the
glob
library to provide pattern matching functionality - The
glob
library contains a function also calledglob
to match file patterns - E.g.,
glob.glob('*.txt')
matches all files in the current directory whose names end with.txt
. - Result is a (possibly empty) list of character strings.
import sys import glob import pandas # we need to import part of matplotlib # because we are no longer in a notebook import matplotlib.pyplot as plt # check for -a flag in arguments if "-a" in sys.argv: filenames = glob.glob("data/*gdp*.csv") else: filenames = sys.argv[1:] for filename in filenames: # load data and transpose so that country names are # the columns and their gdp data becomes the rows data = pandas.read_csv(filename, index_col = 'country').T # create a plot of the transposed data ax = data.plot(title = filename) # set some plot attributes ax.set_xlabel("Year") ax.set_ylabel("GDP Per Capita") # set the x locations and labels ax.set_xticks(range(len(data.index))) ax.set_xticklabels(data.index, rotation = 45) # save the plot with a unique file name split_name1 = filename.split('.')[0] #data/gapminder_gdp_XXX split_name2 = filename.split('/')[1] save_name = 'figs/'+split_name2 + '.png' plt.savefig(save_name)
Let’s test if our run all flag works by running the script before we commit it. It is always good pratice to run your code first so you don’t accidently commit broken code.
OUTPUT
Traceback (most recent call last):
File "gdp_plots.py", line 23, in <module>
ax = data.plot(title = filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/plotting/_core.py", line 1031, in __call__
return plot_backend.plot(data, kind=kind, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/plotting/_matplotlib/__init__.py", line 71, in plot
plot_obj.generate()
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/plotting/_matplotlib/core.py", line 451, in generate
self._compute_plot_data()
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/plotting/_matplotlib/core.py", line 636, in _compute_plot_data
raise TypeError("no numeric data to plot")
TypeError: no numeric data to plot
This error is saying that the data in one or more of our files is non-numeric and it doesn’t know how to plot it. Lets add a little print statement to our code to check the head of our data file for each time through the loop.
import sys import glob import pandas # we need to import part of matplotlib # because we are no longer in a notebook import matplotlib.pyplot as plt # check for -a flag in arguments if "-a" in sys.argv: filenames = glob.glob("data/*gdp*.csv") else: filenames = sys.argv[1:] for filename in filenames: # load data and transpose so that country names are # the columns and their gdp data becomes the rows data = pandas.read_csv(filename, index_col = 'country').T print(filename) print(data.head()) # create a plot of the transposed data ax = data.plot(title = filename) # set some plot attributes ax.set_xlabel("Year") ax.set_ylabel("GDP Per Capita") # set the x locations and labels ax.set_xticks(range(len(data.index))) ax.set_xticklabels(data.index, rotation = 45) # save the plot with a unique file name split_name1 = filename.split('.')[0] #data/gapminder_gdp_XXX split_name2 = filename.split('/')[1] save_name = 'figs/'+split_name2 + '.png' plt.savefig(save_name)
Let’s run the code again with our print statements
OUTPUT
data/gapminder_gdp_americas.csv
country Argentina Bolivia Brazil ... United States Uruguay Venezuela
continent Americas Americas Americas ... Americas Americas Americas
gdpPercap_1952 5911.315053 2677.326347 2108.944355 ... 13990.48208 5716.766744 7689.799761
gdpPercap_1957 6856.856212 2127.686326 2487.365989 ... 14847.12712 6150.772969 9802.466526
gdpPercap_1962 7133.166023 2180.972546 3336.585802 ... 16173.14586 5603.357717 8422.974165
gdpPercap_1967 8052.953021 2586.886053 3429.864357 ... 19530.36557 5444.61962 9541.474188
[5 rows x 25 columns]
Traceback (most recent call last):
File "gdp_plots.py", line 23, in <module>
ax = data.plot(title = filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/plotting/_core.py", line 1031, in __call__
return plot_backend.plot(data, kind=kind, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/plotting/_matplotlib/__init__.py", line 71, in plot
plot_obj.generate()
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/plotting/_matplotlib/core.py", line 451, in generate
self._compute_plot_data()
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/plotting/_matplotlib/core.py", line 636, in _compute_plot_data
raise TypeError("no numeric data to plot")
TypeError: no numeric data to plot
Now we can see in the America’s gdp file there is a row of continent data. There was originally a continent column before we transposed it. Because a we need the same type of data within each column, when we include a string row pandas converts all the values in that column into a string. So it isn’t seeing any(!) of our number values as numeric data that can be plotted.
We can add a check into our code that drops the continent row if it exists.
import sys import glob import pandas # we need to import part of matplotlib # because we are no longer in a notebook import matplotlib.pyplot as plt # check for -a flag in arguments if "-a" in sys.argv: filenames = glob.glob("data/*gdp*.csv") else: filenames = sys.argv[1:] for filename in filenames: # load data and transpose so that country names are # the columns and their gdp data becomes the rows data = pandas.read_csv(filename, index_col = 'country').T if "continent" in data.index: data.drop("continent", inplace=True) print(filename) print(data.head()) # create a plot of the transposed data ax = data.plot(title = filename) # set some plot attributes ax.set_xlabel("Year") ax.set_ylabel("GDP Per Capita") # set the x locations and labels ax.set_xticks(range(len(data.index))) ax.set_xticklabels(data.index, rotation = 45) # save the plot with a unique file name split_name1 = filename.split('.')[0] #data/gapminder_gdp_XXX split_name2 = filename.split('/')[1] save_name = 'figs/'+split_name2 + '.png' plt.savefig(save_name)
Now running the script again works!
It sill prints the filenames and head of each data frame though. We should delete those lines and our final code should be the following script:
import sys import glob import pandas # we need to import part of matplotlib # because we are no longer in a notebook import matplotlib.pyplot as plt # check for -a flag in arguments if "-a" in sys.argv: filenames = glob.glob("data/*gdp*.csv") else: filenames = sys.argv[1:] for filename in filenames: # load data and transpose so that country names are # the columns and their gdp data becomes the rows data = pandas.read_csv(filename, index_col = 'country').T if "continent" in data.index: data.drop("continent", inplace=True) # create a plot of the transposed data ax = data.plot(title = filename) # set some plot attributes ax.set_xlabel("Year") ax.set_ylabel("GDP Per Capita") # set the x locations and labels ax.set_xticks(range(len(data.index))) ax.set_xticklabels(data.index, rotation = 45) # save the plot with a unique file name split_name1 = filename.split('.')[0] #data/gapminder_gdp_XXX split_name2 = filename.split('/')[1] save_name = 'figs/'+split_name2 + '.png' plt.savefig(save_name)
Updating the repository
Yet another successful update to the code. Let’s commit our changes.
The Right Way to Do It
If our programs can take complex parameters or multiple filenames, we
shouldn’t handle sys.argv
directly. Instead, we should use
Python’s argparse
library, which handles common cases in a
systematic way, and also makes it easy for us to provide sensible error
messages for our users. We will not cover this module in this lesson but
you can go to Tshepang Lekhonkhobe’s Argparse
tutorial that is part of Python’s Official Documentation.
Key Points
- Adding command line flags can be a user-friendly way to accomplish common tasks.