Content from Introduction
Last updated on 2025-04-16 | Edit this page
Estimated time: 25 minutes
Overview
Questions
- How can I make my results easier to reproduce?
Objectives
- Explain what SCons is for.
- Explain why SCons differs from shell scripts.
- Name other popular build tools.
Let’s imagine that we’re interested in testing Zipf’s Law in some of our favorite books.
Zipf’s Law
The most frequently-occurring word occurs approximately twice as often as the second most frequent word. This is Zipf’s Law.
We’ve compiled our raw data i.e. the books we want to analyze and have prepared several Python scripts that together make up our analysis pipeline.
Let’s take quick look at one of the books using the command
head books/isles.txt
.
Our directory has the Python scripts and data files we will be working with:
OUTPUT
|- books
| |- abyss.txt
| |- isles.txt
| |- last.txt
| |- LICENSE_TEXTS.md
| |- sierra.txt
|- plotcounts.py
|- countwords.py
|- testzipf.py
The first step is to count the frequency of each word in a book. For
this purpose we will use a python script countwords.py
which takes two command line arguments. The first argument is the input
file (books/isles.txt
) and the second is the output file
that is generated (here isles.dat
) by processing the
input.
Let’s take a quick peek at the result.
This shows us the top 5 lines in the output file:
OUTPUT
the 3822 6.7371760973
of 2460 4.33632998414
and 1723 3.03719372466
to 1479 2.60708619778
a 1308 2.30565838181
We can see that the file consists of one row per word. Each row shows the word itself, the number of occurrences of that word, and the number of occurrences as a percentage of the total number of words in the text file.
We can do the same thing for a different book:
OUTPUT
the 4044 6.35449402891
and 2807 4.41074795726
of 1907 2.99654305468
a 1594 2.50471401634
to 1515 2.38057825267
Let’s visualize the results. The script plotcounts.py
reads in a data file and plots the 10 most frequently occurring words as
a text-based bar plot:
OUTPUT
the ########################################################################
of ##############################################
and ################################
to ############################
a #########################
in ###################
is #################
that ############
by ###########
it ###########
plotcounts.py
can also show the plot graphically:
Close the window to exit the plot.
plotcounts.py
can also create the plot as an image file
(e.g. a PNG file):
Finally, let’s test Zipf’s law for these books:
OUTPUT
Book First Second Ratio
abyss 4044 2807 1.44
isles 3822 2460 1.55
So we’re not too far off from Zipf’s law.
Together these scripts implement a common workflow:
- Read a data file.
- Perform an analysis on this data file.
- Write the analysis results to a new file.
- Plot a graph of the analysis results.
- Save the graph as an image, so we can put it in a paper.
- Make a summary table of the analyses
Running countwords.py
and plotcounts.py
at
the shell prompt, as we have been doing, is fine for one or two files.
If, however, we had 5 or 10 or 20 text files, or if the number of steps
in the pipeline were to expand, this could turn into a lot of work.
Plus, no one wants to sit and wait for a command to finish, even just
for 30 seconds.
The most common solution to the tedium of data processing is to write a shell script that runs the whole pipeline from start to finish.
So to reproduce the tasks that we have just done we create a new file
named run_pipeline.sh
in which we place the commands one by
one. Using a text editor of your choice (e.g. for nano use the command
nano run_pipeline.sh
) copy and paste the following text and
save it.
BASH
# USAGE: bash run_pipeline.sh
# to produce plots for isles and abyss
# and the summary table for the Zipf's law tests
python countwords.py books/isles.txt isles.dat
python countwords.py books/abyss.txt abyss.dat
python plotcounts.py isles.dat isles.png
python plotcounts.py abyss.dat abyss.png
# Generate summary table
python testzipf.py abyss.dat isles.dat > results.txt
Run the script and check that the output is the same as before:
This shell script solves several problems in computational reproducibility:
- It explicitly documents our pipeline, making communication with colleagues (and our future selves) more efficient.
- It allows us to type a single command,
bash run_pipeline.sh
, to reproduce the full analysis. - It prevents us from repeating typos or mistakes. You might not get it right the first time, but once you fix something it’ll stay fixed.
Despite these benefits it has a few shortcomings.
Let’s adjust the width of the bars in our plot produced by
plotcounts.py
.
Edit plotcounts.py
so that the bars are 0.8 units wide
instead of 1 unit. (Hint: replace width = 1.0
with
width = 0.8
in the definition of
plot_word_counts
.)
Now we want to recreate our figures. We could just
bash run_pipeline.sh
again. That would work, but it could
also be a big pain if counting words takes more than a few seconds. The
word counting routine hasn’t changed; we shouldn’t need to recreate
those files.
Alternatively, we could manually rerun the plotting for each word-count file. (Experienced shell scripters can make this easier on themselves using a for-loop.)
With this approach, however, we don’t get many of the benefits of having a shell script in the first place.
Another popular option is to comment out a subset of the lines in
run_pipeline.sh
:
BASH
# USAGE: bash run_pipeline.sh
# to produce plots for isles and abyss
# and the summary table for the Zipf's law tests.
# These lines are commented out because they don't need to be rerun.
#python countwords.py books/isles.txt isles.dat
#python countwords.py books/abyss.txt abyss.dat
python plotcounts.py isles.dat isles.png
python plotcounts.py abyss.dat abyss.png
# Generate summary table
# This line is also commented out because it doesn't need to be rerun.
#python testzipf.py abyss.dat isles.dat > results.txt
Then, we would run our modified shell script using
bash run_pipeline.sh
.
But commenting out these lines, and subsequently uncommenting them, can be a hassle and source of errors in complicated pipelines.
What we really want is an executable description of our pipeline that allows software to do the tricky part for us: figuring out what steps need to be rerun.
For our pipeline SCons can execute the commands needed to run our analysis and plot our results. Like shell scripts it allows us to execute complex sequences of commands via a single shell command. Unlike shell scripts it explicitly records the dependencies between files - what files are needed to create what other files - and so can determine when to recreate our data files or image files, if our text files change. SCons can be used for any commands that follow the general pattern of processing files to create new files, for example:
- Run analysis scripts on raw data files to get data files that summarize the raw data (e.g. creating files with word counts from book text).
- Run visualization scripts on data files to produce plots (e.g. creating images of word counts).
- Parse and combine text files and plots to create papers.
- Compile source code into executable programs or libraries.
There are now many build tools available, for example GNU Make, Apache ANT, doit, and nmake for Windows. Which is best for you depends on your requirements, intended usage, and operating system. However, they all share the same fundamental concepts.
Also, you might come across build generation scripts e.g. GNU Autoconf and CMake. Those tools do not run the pipelines directly, but rather generate files for use with the build tools.
As a Python based build tool, SCons is available on Windows, MacOS,
and Linux. It is distributed with the pip
and
conda
package managers, so it can be installed in the same
Python scientific computing environments popular with computational
science and engineering communities. SCons also uses Python as the
configuration language, so the configuration files will feel familiar to
many students.
Key Points
- SCons allows us to specify what depends on what and how to update things that are out of date.
Content from SConscript files
Last updated on 2025-04-16 | Edit this page
Estimated time: 40 minutes
Overview
Questions
- How do I write a simple SConstruct file?
Objectives
- Recognize the key parts of the SConstruct file, tasks, targets, sources, and actions.
- Write a simple SConstruct file.
- Run SCons from the shell.
- Explain how to create aliases for collections of targets.
- Explain constraints on dependencies.
Create a file, called SConstruct
, with the following
content:
PYTHON
import os
env = Environment(ENV=os.environ.copy())
# Count words.
env.Command(
target=["isles.dat"],
source=["books/isles.txt"],
action=["python countwords.py books/isles.txt isles.dat"],
)
This is a build file, which
for SCons is called an SConscript
file - a file executed by SCons. SConstruct
is the
conventional name for the root configuration file. Secondary
configuration files are named SConscript
by convention, but
can take any filename. Together all SCons configuration files take the
generic name SConscript
files. From now on, SCons
configuration files will be referred to collectively as
SConscript
files, but it is important to remember that
projects usually start with the SConstruct
file naming
convention.
The syntax should be familiar to Python users because SCons uses Python as the configuration language. Note how the action resembles a line from our shell script.
Let us go through each section in turn:
- First we import the
os
module and create an SCons construction environment. as a copy of the active shell environment. Most build managers inherit the active shell environment by default. SCons requires a little more effort, but this separation of construction environment from the external environment is valuable in complex computational science and engineering workflows which may require several mutually exclusive environments in a single workflow. For the purposes of this lesson, we will use a single construction environment inherited from the shell’s active Conda environment. -
#
denotes a comment. Any text from#
to the end of the line is ignored by SCons but could be very helpful for anyone reading your SConstruct file. -
env.Command
is the generic task definition class used by SCons. Note that the task is defined inside the construction environment we created earlier. If there were more than one construction environment available, additional tasks could use unique, task specific, construction environments. -
isles.dat
is a target, a file to be created, or built. -
books/isles.txt
is a source, also called a dependency, a file that is needed to build or update the target. Targets can have one or more dependencies. -
python countwords.py books/isles.txt isles.dat
is an action, a command to run to build or update the target using the sources. Targets can have one or more actions. These actions form a recipe to build the target from its sources and are executed similarly to a shell script. - Targets, sources, and actions are passed as keyword arguments and may be a string or a list of strings.
- Together, the target, sources, and actions form a task.
Our task above describes how to build the target
isles.dat
using the action
python countwords.py
and the source
books/isles.txt
.
Information that was implicit in our shell script - that we are
generating a file called isles.dat
and that creating this
file requires books/isles.txt
- is now made explicit by
SCons’ syntax.
Let’s first ensure we start from scratch and delete the
.dat
and .png
files we created earlier:
By default, SCons looks for a root SConscript file, called
SConstruct
, and we can run SCons as follows:
By default, SCons prints several status messages and the actions it executes:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/isles.txt isles.dat
scons: done building targets.
The status messages can be silenced with the -Q
option.
Let’s see if we got what we expected.
The first 5 lines of isles.dat
should look exactly like
before.
The SConstruct File Does Not Have to be Called
SConstruct
We don’t have to call our root SCons configuration file
SConstruct
. However, if we call it something else we need
to tell SCons where to find it. This we can do using
--sconstruct
option. For example, if our SConstruct file is
named MyOtherSConstruct
:
SCons does not require a specific file extension. The suffix
.scons
can be used to identify SConscript files that are
not called SConstruct
or SConscript
e.g. install.scons
, common.scons
etc.
When we re-run our SConstruct file, SCons now informs us that:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.
SCons uses the special target alias ‘.
’ to indicate ‘all
targets’. No command is run because our target, isles.dat
,
has now been created, and SCons will not create it again. To see how
this works, let’s pretend to update one of the text files. Rather than
opening the file in an editor, we can use the shell touch
command to update its timestamp (which would happen if we did edit the
file):
If we compare the timestamps of books/isles.txt
and
isles.dat
,
then we see that isles.dat
, the target, is now older
than books/isles.txt
, its dependency:
OUTPUT
-rw-r--r-- 1 mjj Administ 323972 Jun 12 10:35 books/isles.txt
-rw-r--r-- 1 mjj Administ 182273 Jun 12 09:58 isles.dat
If we run SCons again,
it does not recreate isles.dat
. Instead reporting that
‘all targets’ are up to date.
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.
This is a surprising result if you are already familiar with other
build managers. Many build managers, such as GNU Make use timestamps to
track the state of source and target files. If we were using Make, Make
would have re-created the isles.dat
file.
By default SCons computes content signatures from the file content to track the state of source and target files. If the content of a file has not changed, it is considered up-to-date and SCons will not create it again. Computing the content signature takes more time than checking a timestamp, so SCons provides an option to use the more traditional timestamp state. However, in computational science and engineering workflows, which often contain tasks requiring hours or days to compute, the added time required to check file content is often a valuable trade-off because it avoids launching long-running tasks more robustly than a simple timestamp check.
To observe SCons re-creating the target isles.dat
, we
must actually modify the books/isles.txt
file. Any change
to the file contents, even adding a newline, will change the content
signature computed as an md5sum
. If we run the
md5sum
ourselves, we can see the signature change before
and after the file edit.
OUTPUT
6cc2c020856be849418f9d744ac1f5ee books/isles.txt
Append an empty newline to the books/isles.txt
file and
check the md5sum
signature again.
OUTPUT
22b5adfc3b267e2e658ba75de4aeb74b books/isles.txt
We can see that appending a blank newline changes the computed
content signature. If we run SCons again, it will re-create
isles.dat
because the content of the source file
books/isles.txt
has changed.
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/isles.txt isles.dat
scons: done building targets.
When it is asked to build a target, SCons checks the ‘content signature’ of both the target and its sources and the ‘action signature’ of the associated action list. If any source or action content has changed since the target was built, then the actions are re-run to update the target. Using this approach, SCons knows to only rebuild the files that, either directly or indirectly, depend on the file that changed. This is called an incremental build.
SConscript Files as Documentation
By explicitly recording the inputs to and outputs from steps in our analysis and the dependencies between files, SConscript files act as a type of documentation, reducing the number of things we have to remember.
Let’s add another task to the end of SConstruct
:
PYTHON
env.Command(
target=["abyss.dat"],
source=["books/abyss.txt"],
action=["python countwords.py books/abyss.txt abyss.dat"],
)
If we run SCons,
then we get:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/abyss.txt abyss.dat
scons: done building targets.
SCons builds the second target but not the first target. The default behavior of SCons is to build all default targets and, unless otherwise specified, all targets are added to the default targets list.
If we do not want to build all targets, we can also build a specific
target by name. First, confirm that running SCons again reports the
special target ‘.
’ up to date to indicate that all targets
are up to date.
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.
Then confirm that when specifying a target, SCons only reports on the requested target.
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `abyss.dat' is up to date.
scons: done building targets.
“Up to Date” Versus “Nothing to be Done”
If we ask SCons to build a file that already exists and is up to date, then SCons informs us that:
OUTPUT
scons: `isles.dat' is up to date.
If we ask SCons to build a file that exists but for which there is no
rule in our SConstruct
file, then we get message like:
OUTPUT
scons: Nothing to be done for `countwords.py'.
up to date
means that the SConstruct
file
has a task with one or more actions whose target is the name of a file
(or directory) and the file is up to date.
Nothing to be done
means that the file exists but the
SConstruct
file has no task for it. Targets that are
defined, but have no action, result in an empty ‘Building targets …’
message without issuing any commands.
We may want to remove all our data files so we can explicitly
recreate them all. SCons provides the --clean
command line
option that will remove targets by request. We can clean all default
targets
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Cleaning targets ...
Removed abyss.dat
Removed isles.dat
scons: done cleaning targets.
or clean all targets with the special target ‘.
’,
regardless of the default list contents
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Cleaning targets ...
Removed abyss.dat
Removed isles.dat
scons: done cleaning targets.
or clean specific targets by name
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Cleaning targets ...
Removed abyss.dat
scons: done cleaning targets.
We may want to simplify specification of some, but not all, targets. We can add an alias to reference all of the data files.
This simplifies calling a non-default target list such that we do not have to write out each target by name. The following two executions of SCons are equivalent.
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/isles.txt isles.dat
python countwords.py books/abyss.txt abyss.dat
scons: done building targets.
When requesting specific targets, the requested targets are reported up-to-date according the name used on the command line. Calling two targets by name results in individual reports, one per target.
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `isles.dat' is up to date.
scons: `abyss.dat' is up to date.
scons: done building targets.
Calling the collector alias dats
results in a single
report for the alias.
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `dats' is up to date.
scons: done building targets.
Dependencies
The order of rebuilding dependencies is arbitrary. Required sources are always built before targets, but if two targets are independent of one another, you should not assume that they will be built in the order in which they are listed.
Dependencies must form a directed acyclic graph. A target cannot depend on a dependency which itself, or one of its dependencies, depends on that target.
Our SConstruct
now looks like this:
PYTHON
import os
env = Environment(ENV=os.environ.copy())
env.Command(
target=["isles.dat"],
source=["books/isles.txt"],
action=["python countwords.py books/isles.txt isles.dat"],
)
env.Command(
target=["abyss.dat"],
source=["books/abyss.txt"],
action=["python countwords.py books/abyss.txt abyss.dat"],
)
env.Alias("dats", ["isles.dat", "abyss.dat"])
The following figure shows a graph of the dependencies embodied
within our SConstruct file, involved in building the dats
alias:

Write Two New Tasks
- Write a new task for
last.dat
, created frombooks/last.txt
. - Update the
dats
alias with this target. - Write a new task for
results.txt
, which creates the summary table. The rule needs to:- Depend upon each of the three
.dat
files. - Invoke the action
python testzipf.py abyss.dat isles.dat last.dat > results.txt
.
- Depend upon each of the three
- Add this target to the default target list so that it is the default target.
The starting SConstruct file is here.
See this file for a solution.
The following figure shows the dependencies embodied within our
SConstruct file, involved in building the results.txt
target:

Key Points
- SConstruct is the default name of the root SCons configuration file
- SCons configuration files are collectively called SConscript files
- SConscript files are Python files.
- SCons tasks are attached to a construction environment, which can be inherited from the shell’s active environment.
- Use
#
for comments in SConscript files. - Write tasks as lists of targets, sources, and actions with the
Command
class - Use an
Alias
to collect targets in a convenient alias for shorter build commands. - Use the
Default
function to limit the number of default targets to a subset of all targets.
Content from Special Substitution Variables
Last updated on 2025-04-16 | Edit this page
Estimated time: 15 minutes
Overview
Questions
- How can I abbreviate the tasks in my SConscript files?
Objectives
- Use SCons special substitution variables to remove duplication in SConscript files.
- Explain why shell wildcards in dependencies can cause problems.
After the exercise at the end of the previous episode, our SConstruct file looked like this:
PYTHON
import os
env = Environment(ENV=os.environ.copy())
env.Command(
target=["isles.dat"],
source=["books/isles.txt"],
action=["python countwords.py books/isles.txt isles.dat"],
)
env.Command(
target=["abyss.dat"],
source=["books/abyss.txt"],
action=["python countwords.py books/abyss.txt abyss.dat"],
)
env.Command(
target=["last.dat"],
source=["books/last.txt"],
action=["python countwords.py books/last.txt last.dat"],
)
env.Alias("dats", ["isles.dat", "abyss.dat", "last.dat"])
env.Command(
target=["results.txt"],
source=["isles.dat", "abyss.dat", "last.dat"],
action=["python testzipf.py abyss.dat isles.dat last.dat > results.txt"],
)
env.Default(["results.txt"])
Our SConstruct file has a lot of duplication. For example, the names of text files and data files are repeated in many places throughout the file. SConscript files are a form of code and, in any code, repeated code can lead to problems e.g. we rename a data file in one part of the SConscript file but forget to rename it elsewhere.
D.R.Y. (Don’t Repeat Yourself)
In many programming languages, the bulk of the language features are there to allow the programmer to describe long-winded computational routines as short, expressive, beautiful code. Features in Python or R or Java, such as user-defined variables and functions are useful in part because they mean we don’t have to write out (or think about) all of the details over and over again. This good habit of writing things out only once is known as the “Don’t Repeat Yourself” principle or D.R.Y.
Let us set about removing some of the repetition from our SConstruct file.
In our results.txt
task we duplicate the data file names
and the name of the results file name:
PYTHON
env.Command(
target=["results.txt"],
source=["isles.dat", "abyss.dat", "last.dat"],
action=["python testzipf.py abyss.dat isles.dat last.dat > results.txt"],
)
Looking at the results file name first, we can replace it in the
action with ${TARGET}
:
PYTHON
env.Command(
target=["results.txt"],
source=["isles.dat", "abyss.dat", "last.dat"],
action=["python testzipf.py abyss.dat isles.dat last.dat > ${TARGET}"],
)
${TARGET}
is an SCons special variable which means
‘the target of the current task’. When SCons is run it will replace this
variable with the target name.
We can replace the sources in the action with
${SOURCES}
:
env.Command(
target=["results.txt"],
source=["isles.dat", "abyss.dat", "last.dat"],
action=["python testzipf.py ${SOURCES} > ${TARGET}"],
)
${SOURCES}
is another special substitution variable
which means ‘all the dependencies of the current task’. Again, when
SCons is run it will replace this variable with the sources.
Let’s clean our workflow and re-run our task:
We get:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/isles.txt isles.dat
python countwords.py books/abyss.txt abyss.dat
python countwords.py books/last.txt last.dat
python testzipf.py isles.dat abyss.dat last.dat > results.txt
scons: done building targets.
`1.
Nothing.
The content of the *.dat
has not changed, so
results.txt
is up to date.
If you run:
you will find that the .dat
files as well as
results.txt
are recreated.
If you run:
you will find that the results.txt
file is recreated
because the content signature of the .dat
files has
changed. However, the .dat
files are not recreated. Despite
our edit, the source and action signatures of the .dat
tasks have not changed. It is important that you never edit targets
manually to avoid out-of-sync and reproducibility errors arising in the
middle of your workflow. You can rely on SCons to know when to rebuild
targets if you have well defined tasks with complete target and sources
lists
If you run this command or manually edited the .dat
files, be sure to clean and rebuild them to remove the manually edited
lines.
As we saw, ${SOURCES}
means ‘all the dependencies of the
current task’. This works well for results.txt
as its
action treats all the dependencies the same - as the input for the
testzipf.py
script.
However, for some tasks, we may want to treat the first dependency
differently. For example, our tasks for .dat
use their
first (and only) dependency specifically as the input file to
countwords.py
. If we add additional dependencies (as we
will soon do) then we don’t want these being passed as input files to
countwords.py
as it expects only one input file to be named
when it is invoked.
SCons allows Pythonic, zero-based indexing of special substitution
variables ${SOURCES}
and ${TARGETS}
for this
use case. For example, ${SOURCES[0]}
means ‘the first
dependency of the current task’.
Rewrite .dat
Tasks to Use Special
Substitution Variables
Rewrite each .dat
task to use the special substitution
variables ${TARGET}
(‘the target of the current task’) and
${SOURCES[0]}
(‘the first dependency of the current task’).
This file contains the
SConstruct immediately before the challenge.
See this file for a solution to this challenge.
Key Points
- Use
${TARGET}
to refer to the target of the current task. - Use
${SOURCES}
to refer to the dependencies of the current task. - Use
${SOURCES[0]}
to refer to the first dependency of the current task.
Content from Dependencies on Data and Code
Last updated on 2025-04-16 | Edit this page
Estimated time: 20 minutes
Overview
Questions
- How can I write an SConsript file to update things when my scripts have changed rather than my input files?
Objectives
- Output files are a product not only of input files but of the scripts or code that created the output files.
- Recognize and avoid false dependencies.
Our SConstruct file now looks like this:
PYTHON
import os
env = Environment(ENV=os.environ.copy())
env.Command(
target=["isles.dat"],
source=["books/isles.txt"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
env.Command(
target=["abyss.dat"],
source=["books/abyss.txt"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
env.Command(
target=["last.dat"],
source=["books/last.txt"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
env.Alias("dats", ["isles.dat", "abyss.dat", "last.dat"])
env.Command(
target=["results.txt"],
source=["isles.dat", "abyss.dat", "last.dat"],
action=["python testzipf.py ${SOURCES} > ${TARGET}"],
)
env.Default(["results.txt"])
Our data files are produced using not only the input text files but
also the script countwords.py
that processes the text files
and creates the data files. A change to countwords.py
(e.g. adding a new column of summary data or removing an existing one)
results in changes to the .dat
files it outputs. So, let’s
pretend to edit countwords.py
, using echo
to
append a blank line, and re-run SCons:
Nothing happens! Though we’ve updated countwords.py
our
data files are not updated because our rules for creating
.dat
files don’t record any dependencies on
countwords.py
.
We need to add countwords.py
as a dependency of each of
our data files also:
PYTHON
env.Command(
target=["isles.dat"],
source=["books/isles.txt", "countwords.py"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
env.Command(
target=["abyss.dat"],
source=["books/abyss.txt", "countwords.py"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
env.Command(
target=["last.dat"],
source=["books/last.txt", "countwords.py"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
If we re-run SCons,
then we get:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/isles.txt isles.dat
python countwords.py books/abyss.txt abyss.dat
python countwords.py books/last.txt last.dat
scons: done building targets.
SCons tracks the source list as part of the task signature. Adding a
new source triggers a rebuild of the targets. Now if we edit the
countwords.py
file, the targets will re-build again.
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/isles.txt isles.dat
python countwords.py books/abyss.txt abyss.dat
python countwords.py books/last.txt last.dat
scons: done building targets.
Dry run
scons
can show the commands it will execute without
actually running them if we pass the --dry-run
flag:
This gives the same output to the screen as without the
--dry-run
flag, but the commands are not actually run.
Using this ‘dry-run’ mode is a good way to check that you have set up
your SConscript tasks properly before actually running the commands.
You can also get an explanation for why SCons would like to recreate
the targets with the --debug=explain
option. This is
helpful when the dry run shows commands you did not expect to run and
you need help tracking down the incorrect task definition.
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: rebuilding `isles.dat' because `countwords.py' changed
python countwords.py books/isles.txt isles.dat
scons: rebuilding `abyss.dat' because `countwords.py' changed
python countwords.py books/abyss.txt abyss.dat
scons: rebuilding `last.dat' because `countwords.py' changed
python countwords.py books/last.txt last.dat
scons: done building targets.
The following figure shows a graph of the dependencies that are
involved in building the target results.txt
. Notice the
recently added dependencies countwords.py
and
testzipf.py
. This is how the SConstruct should look after
completing the rest of the exercises in this episode.

Why Don’t the .txt
Files Depend
on countwords.py
?
.txt
files are input files and as such have no
dependencies. To make these depend on countwords.py
would
introduce a false
dependency which is not desirable.
Intuitively, we should also add countwords.py
as a
dependency for results.txt
, because the final table should
be rebuilt if we remake the .dat
files. However, it turns
out we don’t have to do that! Let’s see what happens to
results.txt
when we update countwords.py
:
then we get:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/isles.txt isles.dat
python countwords.py books/abyss.txt abyss.dat
python countwords.py books/last.txt last.dat
scons: `results.txt' is up to date.
scons: done building targets.
The whole pipeline is triggered, starting with the .dat
files and finishing with the requested results.txt
file! To
understand this, note that according to the dependency figure,
results.txt
depends on the .dat
files. The
update of countwords.py
triggers an update of the
*.dat
files. Finally, SCons reports that the requested
results.txt
file is still up to date.
In a timestamp based build manager, such as Make, the build manager
would see that the dependencies (the .dat
files) are newer
than the target file (results.txt
) and thus it recreates
results.txt
.
With SCons, the results.txt
file is not recreated
because the recreated intermediate .dat
files contain the
same content as the first time we created the results.txt
file, which is therefore not out-of-date.
Both behaviors are examples of the power of a build manager: updating a subset of the files in the pipeline triggers rerunning the appropriate downstream steps. The additional SCons behavior of stopping a pipeline early when intermediate file content has not changed is desirable for computational science and engineering workflows, where some tasks may require hours to days to complete.
1.
only last.dat
is recreated.
Follow the dependency tree and consider the effect of an empty line on the word count calculations to understand the answer(s).
testzipf.py
as a Dependency of
results.txt
.
What would happen if you added testzipf.py
as dependency
of results.txt
, and why?
If you change the rule for the results.txt
file like
this:
PYTHON
env.Command(
target=["results.txt"],
source=["isles.dat", "abyss.dat", "last.dat", "testzipf.py"],
action=["python testzipf.py ${SOURCES} > ${TARGET}"],
)
testzipf.py
becomes a part of ${SOURCES}
,
thus the post-substitution command becomes
This results in an error from testzipf.py
as it tries to
parse the script as if it were a .dat
file. Try this by
running:
You’ll get
ERROR
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python testzipf.py isles.dat abyss.dat last.dat testzipf.py > results.txt
Traceback (most recent call last):
File "/home/roppenheimer/scons-lesson/testzipf.py", line 19, in <module>
counts = load_word_counts(input_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/roppenheimer/scons-lesson/countwords.py", line 39, in load_word_counts
counts.append((fields[0], int(fields[1]), float(fields[2])))
^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'countwords'
scons: *** [results.txt] Error 1
scons: building terminated because of errors.
We still have to add the testzipf.py
script as
dependency to results.txt
. Given the answer to the
challenge above, we need to make a couple of small changes so that we
can still use special substitution variables.
We’ll move testzipf.py
to be the first source. We could
then edit the action so that we pass all the dependencies as arguments
to python using ${SOURCES}
.
PYTHON
env.Command(
target=["results.txt"],
source=["testzipf.py", "isles.dat", "abyss.dat", "last.dat"],
action=["python ${SOURCES} > ${TARGET}"],
)
But it would be helpful to clarify the unique role of the
testzipf.py
as a Python script. We can clarify the intended
roles for different source files by indexing the sources in our action.
SCons allows for Pythonic slicing when indexing special substitution
variables.
PYTHON
env.Command(
target=["results.txt"],
source=["testzipf.py", "isles.dat", "abyss.dat", "last.dat"],
action=["python ${SOURCES[0]} ${SOURCES[1:]} > ${TARGET}"],
)
Index the .dat
task actions
Index the sources for .dat
task actions without changing
the source file order. Remember that SCons allows Pythonic slicing when
indexing special substitution variables.
Where We Are
This SConstruct file contains everything done so far in this topic.
Key Points
- SCons results depend on processing scripts as well as data files.
- Dependencies are transitive: if A depends on B and B depends on C, a change to C will indirectly trigger the pipeline to update to A.
- SCons content signatures help prevent recomputing work if intermediate targets’ contents do not change after recreation.
Content from Builders and Pseudo-builders
Last updated on 2025-04-16 | Edit this page
Estimated time: 10 minutes
Overview
Questions
- How can I define common task operations for similar files?
Objectives
- Write SCons builders and pseudo-builders
Our SConstruct file still has repeated content. The task for each
.dat
file are identical apart from the text and data file
names. We can replace these tasks with a single builder which can be used to build any
.dat
file from a .txt
file in
books/
:
After creating the custom builder, we need to add it to the construction environment to make it available for task definitions.
PYTHON
env = Environment(ENV=os.environ.copy())
env.Append(BUILDERS={"CountWords": count_words_builder})
Now we can convert our .dat
tasks from the
Command
to CountWords
builder.
PYTHON
env.CountWords(
target=["isles.dat"],
source=["books/isles.txt", "countwords.py"],
)
env.CountWords(
target=["abyss.dat"],
source=["books/abyss.txt", "countwords.py"],
)
env.CountWords(
target=["last.dat"],
source=["books/last.txt", "countwords.py"],
)
Custom builders like CountWords
allow us to apply the
same action to many tasks.
If we re-run SCons,
then we get:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/isles.txt isles.dat
python countwords.py books/abyss.txt abyss.dat
python countwords.py books/last.txt last.dat
scons: done building targets.
We can further simplify the task definition by moving the text file
handling inside a pseudo-builder function. Pseudo-builders behave like
builders, but allow flexibility in task construction through
user-defined arguments. We will use the pathlib
module to
help us construct OS-agnostic paths and perform path manipulation.
At the top of your SConstruct
file, update the imports
as below. Then define a new count_words
pseudo-builder
function after the imports to replace the
count_word_builder
and add it to the construction
environment.
PYTHON
import os
import pathlib
def count_words(env, data_file):
"""Pseudo-builder to run the `countwords.py` script and produce `.dat` target
Assumes that the source text file is found in `books/{data_file}.txt`
:param env: SCons construction environment. Do not provide when using this
function with the `env.AddMethod` and `env.CountWords` access style.
:param data_file: String name of the data file to create.
"""
data_path = pathlib.Path(data_file)
text_file = pathlib.Path("books") / data_path.with_suffix(".txt")
target_nodes = env.Command(
target=[data_file],
source=[text_file, "countwords.py"],
action=["python ${SOURCES[-1]} ${SOURCES[0]} ${TARGET}"],
)
return target_nodes
env = Environment(ENV=os.environ.copy())
env.AddMethod(count_words, "CountWords")
This pseudo-builder has further reduced the interface necessary to
define the .dat
tasks, which now can be re-written as
For students familiar with GNU Make, pseudo-builders are similar to Make ‘pattern rules’, but pseudo-builders are both more verbose and more flexible. Pseudo-builders require a full Python function definition syntax, but they can do more than simple file extension pattern matching and anything the user requires.
A psuedo-builder alone will not allow us to match arbitrary files
using the .dat
file extension. If we desire the full Make
‘pattern rule’ behavior, we can accept a target name and match it to our
pseudo-builder with the SCons COMMAND_LINE_TARGETS
variable.
Add the following to the bottom of your SConstruct file
PYTHON
for target in COMMAND_LINE_TARGETS:
if pathlib.Path(target).suffix == ".dat":
env.CountWords(target)
Now we can define tasks for new files not found in our pre-defined tasks as
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/sierra.txt sierra.dat
scons: done building targets.
Where We Are
This SConstruct file contains all of our work so far.
Key Points
- Use the
Builder
function andAppend
the construction environmentBUILDERS
dictionary to define common actions. - Use the
AddMethod
function and Python functions to define pseudo-builders with custom tailored task handling. - Use the special SCons variable
COMMAND_LINE_TARGETS
to perform dynamic handling that depends on command line target requests.
Content from Variables
Last updated on 2025-04-16 | Edit this page
Estimated time: 20 minutes
Overview
Questions
- How can I eliminate redundancy in my SConscript files?
Objectives
- Use variables in SConscript files.
- Explain the benefits of decoupling configuration from computation.
Despite our efforts, our SConstruct still has repeated content,
i.e. the program we use to run our scripts, python
.
Additionally, if we renamed our scripts we’d have to hunt through our
SConstruct file in multiple places.
We can introduce Python variables after the import statements
in SConstruct
to hold our script name:
This is a variable assignment
- COUNT_SOURCE
is assigned the value
"countwords.py"
and behaves like (actually is) normal
Python variable assignment. The all capitals naming convention indicates
that the variable is intended for use as a setting or constant
value.
We can do the same thing with the interpreter language used to run the script:
Similar to the SCons special substitution variables, we can define
any number of per-task or per-builder substitution variables with
keyword arguments. The same ${...}
substitution syntax
tells SCons to replace a task action string variable with its value when
SCons is run.
Defining the variable LANGUAGE
in this way avoids
repeating python
in our SConstruct file, and allows us to
easily change how our script is run (e.g. we might want to use a
different version of Python and need to change python
to
python2
– or we might want to rewrite the script using
another language (e.g. switch from Python to R)).
In the count_words
pseudo-builder function we will
define optional arguments language
and
count_source
, which are defined to default as
LANGUAGE
and COUNT_SOURCE
respectively and
passed through as Command
task keyword arguments. This
tells SCons to replace the variable language
with its value
python
, and to replace the variable
count_source
with its value countwords.py
.
We will define and use the intermediate function keyword argument
variables instead of using the upper case variables directly to avoid
mixing up the function’s scope with the SConstruct
scope.
This is a detail of good practice in Python development, and since
SConscript files are Python code, you should follow the usual Python
practices and style guides wherever possible.
Use Variables
Update SConstruct
so that the .dat
rule
references the variable count_source
and
language
. Then do the same for the testzipf.py
script and the results.txt
rule, using
ZIPF_SOURCE
as the variable name.
This SConstruct file contains a solution to this challenge.
We place variables intended for use as configuration constants at the
top of an SConstruct file so they are easy to find and modify.
Alternatively, we can pull them out into a new file that just holds
variable definitions (i.e. delete them from the original SConstruct
file). Because SCons uses Python as the configuration language, we can
also move our custom builders and psuedo-builders. Let us create
scons_lesson_configuration.py
from the content below.
PYTHON
import pathlib
COUNT_SOURCE = "countwords.py"
LANGUAGE = "python"
ZIPF_SOURCE = "testzipf.py"
def count_words(env, data_file, language=LANGUAGE, count_source=COUNT_SOURCE):
"""Pseudo-builder to produce `.dat` targets from the `countwords.py` script
Assumes that the source text file is found in `books/{data_file}.txt`
:param env: SCons construction environment. Do not provide when using this
function with the `env.AddMethod` and `env.CountWords` access style.
:param data_file: String name of the data file to create.
"""
data_path = pathlib.Path(data_file)
text_file = pathlib.Path("books") / data_path.with_suffix(".txt")
target_nodes = env.Command(
target=[data_file],
source=[text_file, count_source],
action=["${language} ${count_source} ${SOURCES[0]} ${TARGET}"],
language=language,
count_source=count_source,
)
return target_nodes
An added benefit to moving our custom functions into a file with the
.py
extension is that we can use automated documentation
tools, such as Sphinx, to build
project documentation.
We can then import scons_lesson_configuration.py
into
the SConstruct file with a standard Python import:
Note that the above import statement merges the module namespace into
the SConstruct file namespace. We must be careful to avoid re-defining
variable and function names provided by
scons_lesson_configuration.py
in our
SConstruct
file, which would overwrite the names provided
by our module and lead to unexpected behavior.
We can re-run SCons to see that everything still works:
Where We Are
This SConstruct file and this Python module contain all of our work so far.
Key Points
- Define variables by assigning values to names with Python syntax
- Reference variables in action strings using SCons substitution
syntax
${...}
.
Content from Functions
Last updated on 2025-04-16 | Edit this page
Estimated time: 25 minutes
Overview
Questions
- How else can I eliminate redundancy in my SConscript files?
Objectives
- Write SConscript files that use functions to match and transform sets of files.
At this point, we have the following SConstruct file:
PYTHON
import os
import pathlib
from scons_lesson_configuration import *
env = Environment(ENV=os.environ.copy())
env.AddMethod(count_words, "CountWords")
env.CountWords("isles.dat")
env.CountWords("abyss.dat")
env.CountWords("last.dat")
env.Alias("dats", ["isles.dat", "abyss.dat", "last.dat"])
env.Command(
target=["results.txt"],
source=[ZIPF_SOURCE, "isles.dat", "abyss.dat", "last.dat"],
action=["${language} ${zipf_source} ${SOURCES[1:]} > ${TARGET}"],
language=LANGUAGE,
zipf_source=ZIPF_SOURCE,
)
env.Default(["results.txt"])
for target in COMMAND_LINE_TARGETS:
if pathlib.Path(target).suffix == ".dat":
env.CountWords(target)
Python and SCons have many functions which can be used to write
more complex tasks. One example is Glob
. Glob
gets a list of files matching some pattern, which we can then save in a
variable. So, for example, we can get a list of all our text files
(files ending in .txt
) and save these in a variable by
updating the beginning of our scons_lesson_configuration.py
file:
PYTHON
import pathlib
import SCons.Script
COUNT_SOURCE = "countwords.py"
LANGUAGE = "python"
ZIPF_SOURCE = "testzipf.py"
TEXT_FILES = SCons.Script.Glob("books/*.txt")
Because our new Python module is no longer part of the SConstruct
file, it does not have direct access to the special SCons namespace. We
need to import SCons like a Python package to use the Glob
function.
We can add a custom
command-line option --variables
to our SConstruct file
to print the TEXT_FILES
value and exit configuration prior
to building using Python f-string
syntax:
PYTHON
AddOption(
"--variables",
action="store_true",
help="Print the text files returned by Glob and exit (default: %default)",
)
if GetOption("variables"):
text_file_strings = [str(node) for node in TEXT_FILES]
print(f"TEXT_FILES: {text_file_strings}")
Exit(0)
print and Exit
We can use the Python built-in print
function to print
to STDOUT, which is our terminal by default. If we needed to execute a
shell command, we could also use the Execute
function to run a command immediately during configuration instead of
defining a task.
The SCons Exit
function exits the configuration
immediately. Here we use zero as the conventional ‘success’ code of most
shells because the intended behavior of the --text-file
option is documented as an early exit from configuration.
If we run SCons:
We get:
OUTPUT
scons: Reading SConscript files ...
TEXT_FILES: ['books/abyss.txt', 'books/isles.txt', 'books/last.txt', 'books/sierra.txt']
Note how sierra.txt
is now included too. There are some
progress messages missing from the output due to the early
Exit
. The configuration phase is exited immediately and
there is no build phase.
We can construct a list of data files with a list comprehension that
performs path manipulation of the text files list to the
scons_lesson_configuration.py
module. We will use the
pathlib
module again for OS-agnostic path separators.
PYTHON
DATA_FILES = [
pathlib.Path(str(text_file)).with_suffix(".dat").name
for text_file in TEXT_FILES
]
We can extend the --variables
option in SConstruct file
to show the value of DATA_FILES
too. We cast the SCons node
objects into a string, then create a pathlib.Path
object,
and finally trim the parent directory to get our data file name. These
operations return a list of strings, so we can print
the
list directly.
PYTHON
if GetOption("variables"):
text_file_strings = [str(node) for node in TEXT_FILES]
print(f"TEXT_FILES: {text_file_strings}")
print(f"DATA_FILES: {DATA_FILES}")
Exit(0)
If we run SCons,
then we get:
OUTPUT
scons: Reading SConscript files ...
TEXT_FILES: ['books/abyss.txt', 'books/isles.txt', 'books/last.txt', 'books/sierra.txt']
DATA_FILES: ['abyss.dat', 'isles.dat', 'last.dat', 'sierra.dat']
Finally, we can update our count_words
function in
scons_lesson_configuration.py
to accept a list of data
files and reduce our CountWords
function calls to a single
instance in SConstruct
. We will have to collect the target
nodes returned by Command
and compile the full list of
target nodes to return from our psuedo-builder.
PYTHON
def count_words(env, data_files, language=LANGUAGE, count_source=COUNT_SOURCE):
"""Pseudo-builder to produce `.dat` targets from the `countwords.py` script
Assumes that the source text files are found in `books/{data_file}.txt`
:param env: SCons construction environment. Do not provide when using this
function with the `env.AddMethod` and `env.CountWords` access style.
:param data_files: List of string names of the data files to create.
"""
target_nodes = []
for data_file in data_files:
data_path = pathlib.Path(data_file)
text_file = pathlib.Path("books") / data_path.with_suffix(".txt")
target_nodes.extend(
env.Command(
target=[data_file],
source=[text_file, count_source],
action=["${language} ${count_source} ${SOURCES[0]} ${TARGET}"],
language=language,
count_source=count_source,
)
)
return target_nodes
PYTHON
env = Environment(ENV=os.environ.copy())
env.AddMethod(count_words, "CountWords")
env.CountWords(DATA_FILES)
env.Alias("dats", DATA_FILES)
Now, sierra.txt
is processed, too. If you update the
Alias
function call, we can process all .txt
files with the same dats
alias. The
COMMAND_LIST_TARGETS
loop is no longer required and may be
removed.
We get:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/abyss.txt abyss.dat
python countwords.py books/isles.txt isles.dat
python countwords.py books/last.txt last.dat
python countwords.py books/sierra.txt sierra.dat
scons: done building targets.
We can also rewrite results.txt
:
env.Command(
target=["results.txt"],
source=[ZIPF_SOURCE] + DATA_FILES,
action=["${language} ${zipf_source} ${SOURCES[1:]} > ${TARGET}"],
language=LANGUAGE,
zipf_source=ZIPF_SOURCE,
)
If we re-run SCons:
We get:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/abyss.txt abyss.dat
python countwords.py books/isles.txt isles.dat
python countwords.py books/last.txt last.dat
python countwords.py books/sierra.txt sierra.dat
python testzipf.py abyss.dat isles.dat last.dat sierra.dat > results.txt
scons: done building targets.
Let’s check the results.txt
file:
OUTPUT
Book First Second Ratio
abyss 4044 2807 1.44
isles 3822 2460 1.55
last 12244 5566 2.20
sierra 4242 2469 1.72
So the range of the ratios of occurrences of the two most frequent words in our books is indeed around 2, as predicted by Zipf’s Law, i.e., the most frequently-occurring word occurs approximately twice as often as the second most frequent word.
Here is our final SConstruct file:
PYTHON
import os
import pathlib
from scons_lesson_configuration import *
AddOption(
"--variables",
action="store_true",
default=False,
help="Print the files returned by Glob and exit (default: '%default')",
)
if GetOption("variables"):
text_file_strings = [str(node) for node in TEXT_FILES]
print(f"TEXT_FILES: {text_file_strings}")
print(f"DATA_FILES: {DATA_FILES}")
Exit(0)
env = Environment(ENV=os.environ.copy())
env.AddMethod(count_words, "CountWords")
env.CountWords(DATA_FILES)
env.Alias("dats", DATA_FILES)
env.Command(
target=["results.txt"],
source=[ZIPF_SOURCE] + DATA_FILES,
action=["${language} ${zipf_source} ${SOURCES[1:]} > ${TARGET}"],
language=LANGUAGE,
zipf_source=ZIPF_SOURCE,
)
env.Default(["results.txt"])
and the supporting scons_lesson_configuration.py
module:
PYTHON
import pathlib
import SCons.Script
COUNT_SOURCE = "countwords.py"
LANGUAGE = "python"
ZIPF_SOURCE = "testzipf.py"
TEXT_FILES = SCons.Script.Glob("books/*.txt")
DATA_FILES = [
pathlib.Path(str(text_file)).with_suffix(".dat").name
for text_file in TEXT_FILES
]
def count_words(env, data_files, language=LANGUAGE, count_source=COUNT_SOURCE):
"""Pseudo-builder to produce `.dat` targets from the `countwords.py` script
Assumes that the source text files are found in `books/{data_file}.txt`
:param env: SCons construction environment. Do not provide when using this
function with the `env.AddMethod` and `env.CountWords` access style.
:param data_files: List of string names of the data files to create.
"""
target_nodes = []
for data_file in data_files:
data_path = pathlib.Path(data_file)
text_file = pathlib.Path("books") / data_path.with_suffix(".txt")
target_nodes.extend(
env.Command(
target=[data_file],
source=[text_file, count_source],
action=["${language} ${count_source} ${SOURCES[0]} ${TARGET}"],
language=language,
count_source=count_source,
)
)
return target_nodes
The following figure shows the dependencies embodied within our
SConstruct file, involved in building the results.txt
target, now we have introduced our function:

Where We Are
This SConstruct file and this Python module contain all of our work so far.
Adding more books
We can now do a better job at testing Zipf’s rule by adding more books. The books we have used come from the Project Gutenberg website. Project Gutenberg offers thousands of free ebooks to download.
Exercise instructions:
- go to Project Gutenberg and use the search box to find another book, for example ‘The Picture of Dorian Gray’ from Oscar Wilde.
- download the ‘Plain Text UTF-8’ version and save it to the
books
folder; choose a short name for the file (that doesn’t include spaces) e.g. “dorian_gray.txt” because the filename is going to be used in theresults.txt
file - optionally, open the file in a text editor and remove extraneous
text at the beginning and end (look for the phrase
END OF THE PROJECT GUTENBERG EBOOK [title]
) - run
scons
and check that the correct commands are run, given the dependency tree - check the results.txt file to see how this book compares to the others
Key Points
- SCons uses the Python programming language with acces to all of Python’s many built-in functions.
- SCons provides many functions that work natively with the internal node objects required to manage the SCons directed graph.
- Use the SCons
Glob
function to get lists of SCons nodes from file names matching a pattern. - Use Python built-in and standard library modules to manage file names and paths.
Content from Self-Documenting SConstruct files
Last updated on 2025-04-16 | Edit this page
Estimated time: 10 minutes
Overview
Questions
- How should I document an SConscript file?
Objectives
- Write self-documenting SConstruct files with built-in help.
Many bash commands, and programs that people have written that can be
run from within bash, support a --help
flag to display more
information on how to use the commands or programs. In this spirit, it
can be useful, both for ourselves and for others, to provide a
--help
option in our SConstruct file. This can provide a
summary of the names of the key targets and what they do, so we don’t
need to look at the SConstruct file itself unless we want to.
SCons provides the common --help
flag and a
Help
function for building user customizable help messages.
The less common -H
flag will print the SCons help message.
For our SConstruct file, running with --help
option might
print:
OUTPUT
Local Options:
--variables Print the text files returned by Glob and exit (default: False)
Default Targets:
results.txt
Aliases:
dats
Where SCons is composing the help message for our custom command-line
options for us already. So, how would we implement this? We could call
Help
like:
PYTHON
help_message = "\n\nDefault Targets:\n results.txt\n\nAliases:\n dats"
env.Help(help_message, append=True, keep_local=True)
But every time we add or remove a task, or change the default target
list, we would have to update the help message string manually. It would
be better if we could construct the list of default targets and aliases
from the configured tasks. We can use the SCons default_ans
and DEFAULT_TARGETS
variables. First update the imports at
the top of the scons_lesson_configuration.py
file
PYTHON
import pathlib
import SCons.Script
from SCons.Script import DEFAULT_TARGETS
from SCons.Node.Alias import default_ans
Then add new help message construction functions at the bottom of the
scons_lesson_configuration.py
file.
PYTHON
def return_help_content(nodes, message="", help_content=dict()):
"""Return a dictionary of {node: message} string pairs
Helpful in constructing help content for :meth:`project_help`. Will not
overwrite existing keys.
:param nodes: SCons node objects, e.g. targets and aliases
:param str message: Help message to assign to every node in nodes
:param dict help_content: Optional dictionary with target help messages
``{target: help}``
:returns: Dictionary of {node: message} string pairs
:rtype: dict
"""
new_help_content = {str(node): message for node in nodes}
new_help_content.update(help_content)
return new_help_content
def project_help(help_content=dict()):
"""Append the SCons help message with default targets and aliases
Must come *after* all default targets and aliases are defined.
:param dict help_content: Optional dictionary with target help messages
``{target: help}``
"""
def add_content(nodes, message="", help_content=help_content):
"""Append a help message for all nodes using provided help content if
available.
:param nodes: SCons node objects, e.g. targets and aliases
:param str message: Help message to assign to every node in nodes
:param dict help_content: Optional dictionary with target help messages
``{target: help}``
:returns: appended help message
:rtype: str
"""
keys = [str(node) for node in nodes]
for key in keys:
if key in help_content.keys():
message += f" {key}: {help_content[key]}\n"
else:
message += f" {key}\n"
return message
defaults_message = add_content(
DEFAULT_TARGETS, message="\nDefault Targets:\n"
)
alias_message = add_content(default_ans, message="\nTarget Aliases:\n")
SCons.Script.Help(
defaults_message + alias_message, append=True, keep_local=True
)
Finally, update the bottom of the SConstruct
file with
the new function calls. It is important that the
project_help
call comes after all default targets
are assigned and all aliases are created.
PYTHON
dats = env.Alias("dats", DATA_FILES)
help_content = return_help_content(
dats,
"Count words in text files.",
)
results = env.Command(
target=["results.txt"],
source=[ZIPF_SOURCE] + DATA_FILES,
action=["${language} ${zipf_source} ${SOURCES[1:]} > ${TARGET}"],
language=LANGUAGE,
zipf_source=ZIPF_SOURCE,
)
help_content = return_help_content(
results,
"Generate Zipf summary table.",
help_content,
)
env.Default(["results.txt"])
project_help(help_content)
If we now run
we get some SCons status messages, our help message, and the hint for the full SCons help message:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
Local Options:
--variables Print the files returned by Glob and exit (default: 'False')
Default Targets:
results.txt: Generate Zipf summary table.
Target Aliases:
dats: Count words in text files.
Use scons -H for help about SCons built-in command-line options.
If we add, change or remove a default target or alias, we will automatically see updated lists in our help messages.
Where We Are
This SConstruct file and this Python module contain all of our work so far.
Key Points
- Document SConstruct options, targets, and aliases with the SCons
default_ans
andDEFAULT_TARGETS
variables and theHelp
function.
Content from Conclusion
Last updated on 2025-04-16 | Edit this page
Estimated time: 35 minutes
Overview
Questions
- What are the advantages and disadvantages of using tools like SCons?
Objectives
- Understand advantages of automated build tools such as SCons.
Automated build tools such as SCons can help us in a number of ways. They help us to automate repetitive commands, hence saving us time and reducing the likelihood of errors compared with running these commands manually.
They can also save time by ensuring that automatically-generated artifacts (such as data files or plots) are only recreated when the files that were used to create these have changed in some way.
Through their notion of targets, sources, and actions, they serve as a form of documentation, recording dependencies between code, scripts, tools, configurations, raw data, derived data, plots, and papers.
Creating PNGs
Add new rules, update existing rules, and add new variables to:
- Create
.png
files from.dat
files usingplotcounts.py
. - Update the default target to include the
.png
files. - Remove all auto-generated files (
.dat
,.png
,results.txt
).
This SConstruct file and this Python module contain a simple solution to this challenge.
Can you think of a way to reduce duplication in
count_words
and plot_counts
functions?
To remove all targets, use the SCons special ‘.
’ target
and the --clean
flag.
The following figure shows the dependencies involved in building the
‘.
’ or ‘all’ target, once we’ve added support for
images:

Creating an Archive
Often it is useful to create an archive file of your project that includes all data, code and results. An archive file can package many files into a single file that can easily be downloaded and shared with collaborators. We can add steps to create the archive file inside the SConstruct itself so it’s easy to update our archive file as the project changes.
Edit the SConstruct to create an archive file of your project. Add new rules, update existing rules and add new variables to:
Create a
zipf_analysis.tar.gz
archive including our code, data, plots, the Zipf summary table, the SConstruct file with the SConsTar
builder.Update the default targets list so that it creates
zipf_analysis.tar.gz
.Print the values of any additional variables you have defined when
scons --variables
is called.
This SConstruct file and this Python module contain a simple solution to this challenge.
Archiving the SConstruct file
Why does the SCons task for the archive directory add the SConstruct to our archive of code, data, plots and Zipf summary table?
Our code files (countwords.py
,
plotcounts.py
, testzipf.py
) implement the
individual parts of our workflow. They allow us to create
.dat
files from .txt
files, and
results.txt
and .png
files from
.dat
files. Our SConstruct file, however, documents
dependencies between our code, raw data, derived data, and plots, as
well as implementing our workflow as a whole.
Key Points
- SCons and SConscript files save time by automating repetitive work, and save thinking by documenting how to reproduce results.