This lesson is in the early stages of development (Alpha version)

Python Testing and Continuous Integration

In this lesson we use a Python library called pytest. Basic understanding of Python variables and functions are a necessary prerequisite. Some previous experience with the shell is expected, but isn’t mandatory.

Prerequisites

Nothing to do: you’re ready to go!

Before relying on a new experimental device, an experimental scientist always establishes its accuracy. A new detector is calibrated when the scientist observes its responses to known input signals. The results of this calibration are compared against the expected response. An experimental scientist would never conduct an experiment with uncalibrated detectors - that would be unscientific. So too, simulations and analysis with untested software do not constitute science.

You only know what you test

You can only know by testing it. Software bugs are hiding in all nontrivial software. Testing is the process by which those bugs are systematically exterminated before they have a chance to cause a paper retraction. In software tests, just like in device calibration, expected results are compared with observed results in order to establish accuracy.

The collection of all of the tests for a given code is known as the test suite. You can think of the test suite as a bunch of pre-canned experiments that anyone can run. If all of the test pass, then the code is at least partially trustworthy. If any of the tests fail then the code is known to be incorrect with respect to whichever case failed. After this lesson, you will know to not trust software when its tests do not cover its claimed capabilities and when its tests do not pass.

Managing Expectations

In the same way that your scientific domain has expectations concerning experimental accuracy, it likely also has expectations concerning allowable computational accuracy. These considerations should surely come into play when you evaluate the acceptability of your own or someone else’s software.

In most other programming endeavors, if code is fundamentally wrong - even for years at a time - the impact of this error can be relatively small. Perhaps a website goes down, or a game crashes, or a days worth of writing is lost to a bug in your word processor. Scientific code, on the other hand, controls planes, weapons systems, satellites, agriculture, and most importantly scientific simulations and experiments. If the software that governs the computational or physical experiment is wrong, then disasters (such as false claims in a publication) will result.

This is not to say that scientists have a monopoly on software testing, simply that software cannot be called scientific unless it has been validated.

Code without tests… is legacy code!

In Working Effectively with Legacy Code, Michael Feathers defines legacy code as “any code without tests”. This definition draws on the fact that after its initial creation, tests provide a powerful guide to other developers (and to your forgetful self, a few months in the future) about how each function is meant to be used. Without runnable tests to provide examples of code use, even brand new programs are unsustainable.

Testing is the calibration step of the computational simulation and analysis world: it lets the scientist trust their own work on a fundamental level and helps others to understand and trust their work as well. Furthermore, tests help you to never fix a bug a second time. Once a bug has been caught and a test has been written, that particular bug can never again re-enter the codebase unnoticed. So, whether motivated by principles or a desire to work more efficiently, all scientists can benefit from testing.

Where these lessons are from

Note that this testing lesson was adapted from the Testing chapter in Effective Computation In Physics by Anthony Scopatz and Kathryn Huff. It is often quoted directly.

Schedule

Setup Download files required for the lesson
00:00 1. Basics of Testing Why test?
00:05 2. Assertions How can we compare observed and expected values?
00:15 3. Exceptions How do I handle unusual behavior while the code runs?
00:25 4. Design by Contract What is Design by Contract?
00:40 5. Unit Tests What is a unit of code?
00:50 6. Running Tests with pytest How do I automate my tests?
01:00 7. Edge and Corner Cases How do I catch all the possible errors?
01:10 8. Integration and Regression Tests How do we test more than a single unit of software?
01:20 9. Continuous Integration How can I automate running the tests on more platforms than my own?
01:30 10. Test Driven Development How do you make testing part of the code writing process?
01:40 11. Fixtures How do I create and cleanup the data I need to test the code?
01:50 12. Key Points and Glossary
01:50 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.