Dependencies on Data and Code
Last updated on 2025-04-16 | Edit this page
Estimated time: 20 minutes
Overview
Questions
- How can I write an SConsript file to update things when my scripts have changed rather than my input files?
Objectives
- Output files are a product not only of input files but of the scripts or code that created the output files.
- Recognize and avoid false dependencies.
Our SConstruct file now looks like this:
PYTHON
import os
env = Environment(ENV=os.environ.copy())
env.Command(
target=["isles.dat"],
source=["books/isles.txt"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
env.Command(
target=["abyss.dat"],
source=["books/abyss.txt"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
env.Command(
target=["last.dat"],
source=["books/last.txt"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
env.Alias("dats", ["isles.dat", "abyss.dat", "last.dat"])
env.Command(
target=["results.txt"],
source=["isles.dat", "abyss.dat", "last.dat"],
action=["python testzipf.py ${SOURCES} > ${TARGET}"],
)
env.Default(["results.txt"])
Our data files are produced using not only the input text files but
also the script countwords.py
that processes the text files
and creates the data files. A change to countwords.py
(e.g. adding a new column of summary data or removing an existing one)
results in changes to the .dat
files it outputs. So, let’s
pretend to edit countwords.py
, using echo
to
append a blank line, and re-run SCons:
Nothing happens! Though we’ve updated countwords.py
our
data files are not updated because our rules for creating
.dat
files don’t record any dependencies on
countwords.py
.
We need to add countwords.py
as a dependency of each of
our data files also:
PYTHON
env.Command(
target=["isles.dat"],
source=["books/isles.txt", "countwords.py"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
env.Command(
target=["abyss.dat"],
source=["books/abyss.txt", "countwords.py"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
env.Command(
target=["last.dat"],
source=["books/last.txt", "countwords.py"],
action=["python countwords.py ${SOURCES[0]} ${TARGET}"],
)
If we re-run SCons,
then we get:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/isles.txt isles.dat
python countwords.py books/abyss.txt abyss.dat
python countwords.py books/last.txt last.dat
scons: done building targets.
SCons tracks the source list as part of the task signature. Adding a
new source triggers a rebuild of the targets. Now if we edit the
countwords.py
file, the targets will re-build again.
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/isles.txt isles.dat
python countwords.py books/abyss.txt abyss.dat
python countwords.py books/last.txt last.dat
scons: done building targets.
Dry run
scons
can show the commands it will execute without
actually running them if we pass the --dry-run
flag:
This gives the same output to the screen as without the
--dry-run
flag, but the commands are not actually run.
Using this ‘dry-run’ mode is a good way to check that you have set up
your SConscript tasks properly before actually running the commands.
You can also get an explanation for why SCons would like to recreate
the targets with the --debug=explain
option. This is
helpful when the dry run shows commands you did not expect to run and
you need help tracking down the incorrect task definition.
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: rebuilding `isles.dat' because `countwords.py' changed
python countwords.py books/isles.txt isles.dat
scons: rebuilding `abyss.dat' because `countwords.py' changed
python countwords.py books/abyss.txt abyss.dat
scons: rebuilding `last.dat' because `countwords.py' changed
python countwords.py books/last.txt last.dat
scons: done building targets.
The following figure shows a graph of the dependencies that are
involved in building the target results.txt
. Notice the
recently added dependencies countwords.py
and
testzipf.py
. This is how the SConstruct should look after
completing the rest of the exercises in this episode.

Why Don’t the .txt
Files Depend
on countwords.py
?
.txt
files are input files and as such have no
dependencies. To make these depend on countwords.py
would
introduce a false
dependency which is not desirable.
Intuitively, we should also add countwords.py
as a
dependency for results.txt
, because the final table should
be rebuilt if we remake the .dat
files. However, it turns
out we don’t have to do that! Let’s see what happens to
results.txt
when we update countwords.py
:
then we get:
OUTPUT
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python countwords.py books/isles.txt isles.dat
python countwords.py books/abyss.txt abyss.dat
python countwords.py books/last.txt last.dat
scons: `results.txt' is up to date.
scons: done building targets.
The whole pipeline is triggered, starting with the .dat
files and finishing with the requested results.txt
file! To
understand this, note that according to the dependency figure,
results.txt
depends on the .dat
files. The
update of countwords.py
triggers an update of the
*.dat
files. Finally, SCons reports that the requested
results.txt
file is still up to date.
In a timestamp based build manager, such as Make, the build manager
would see that the dependencies (the .dat
files) are newer
than the target file (results.txt
) and thus it recreates
results.txt
.
With SCons, the results.txt
file is not recreated
because the recreated intermediate .dat
files contain the
same content as the first time we created the results.txt
file, which is therefore not out-of-date.
Both behaviors are examples of the power of a build manager: updating a subset of the files in the pipeline triggers rerunning the appropriate downstream steps. The additional SCons behavior of stopping a pipeline early when intermediate file content has not changed is desirable for computational science and engineering workflows, where some tasks may require hours to days to complete.
1.
only last.dat
is recreated.
Follow the dependency tree and consider the effect of an empty line on the word count calculations to understand the answer(s).
testzipf.py
as a Dependency of
results.txt
.
What would happen if you added testzipf.py
as dependency
of results.txt
, and why?
If you change the rule for the results.txt
file like
this:
PYTHON
env.Command(
target=["results.txt"],
source=["isles.dat", "abyss.dat", "last.dat", "testzipf.py"],
action=["python testzipf.py ${SOURCES} > ${TARGET}"],
)
testzipf.py
becomes a part of ${SOURCES}
,
thus the post-substitution command becomes
This results in an error from testzipf.py
as it tries to
parse the script as if it were a .dat
file. Try this by
running:
You’ll get
ERROR
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
python testzipf.py isles.dat abyss.dat last.dat testzipf.py > results.txt
Traceback (most recent call last):
File "/home/roppenheimer/scons-lesson/testzipf.py", line 19, in <module>
counts = load_word_counts(input_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/roppenheimer/scons-lesson/countwords.py", line 39, in load_word_counts
counts.append((fields[0], int(fields[1]), float(fields[2])))
^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'countwords'
scons: *** [results.txt] Error 1
scons: building terminated because of errors.
We still have to add the testzipf.py
script as
dependency to results.txt
. Given the answer to the
challenge above, we need to make a couple of small changes so that we
can still use special substitution variables.
We’ll move testzipf.py
to be the first source. We could
then edit the action so that we pass all the dependencies as arguments
to python using ${SOURCES}
.
PYTHON
env.Command(
target=["results.txt"],
source=["testzipf.py", "isles.dat", "abyss.dat", "last.dat"],
action=["python ${SOURCES} > ${TARGET}"],
)
But it would be helpful to clarify the unique role of the
testzipf.py
as a Python script. We can clarify the intended
roles for different source files by indexing the sources in our action.
SCons allows for Pythonic slicing when indexing special substitution
variables.
PYTHON
env.Command(
target=["results.txt"],
source=["testzipf.py", "isles.dat", "abyss.dat", "last.dat"],
action=["python ${SOURCES[0]} ${SOURCES[1:]} > ${TARGET}"],
)
Index the .dat
task actions
Index the sources for .dat
task actions without changing
the source file order. Remember that SCons allows Pythonic slicing when
indexing special substitution variables.
Where We Are
This SConstruct file contains everything done so far in this topic.
Key Points
- SCons results depend on processing scripts as well as data files.
- Dependencies are transitive: if A depends on B and B depends on C, a change to C will indirectly trigger the pipeline to update to A.
- SCons content signatures help prevent recomputing work if intermediate targets’ contents do not change after recreation.