This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Intermediate Research Software Development: Glossary

Key Points

Setting the Scene
  • This lesson focuses on core, intermediate skills covering the whole software development life-cycle that will be of most use to anyone working collaboratively on code.

  • For code development in teams - you need more than just the right tools and languages. You need a strategy (best practices) for how you’ll use these tools as a team.

  • The lesson follows on from the novice Software Carpentry lesson, but this is not a prerequisite for attending as long as you have some basic Python, command line and Git skills and you have been using them for a while to write code to help with your work.

Section 1: Setting Up Environment For Collaborative Code Development
  • In order to develop (write, test, debug, backup) code efficiently, you need to use a number of different tools.

  • When there is a choice of tools for a task you will have to decide which tool is right for you, which may be a matter of personal preference or what the team or community you belong to is using.

Introduction to Our Software Project
  • Programming interfaces define how individual modules within a software application interact among themselves or how the application itself interacts with its users.

  • MVC is a software design architecture which divides the application into three interconnected modules: Model (data), View (user interface), and Controller (input/output and data manipulation).

  • The software project we use throughout this course is an example of an MVC application that manipulates patients’ inflammation data and performs basic statistical analysis using Python.

Virtual Environments For Software Development
  • Virtual environments keep Python versions and dependencies required by different projects separate.

  • A virtual environment is itself a directory structure.

  • Use venv to create and manage Python virtual environments.

  • Use pip to install and manage Python external (third-party) libraries.

  • pip allows you to declare all dependencies for a project in a separate file (by convention called requirements.txt) which can be shared with collaborators/users and used to replicate a virtual environment.

  • Use python3 -m pip freeze > requirements.txt to take snapshot of your project’s dependencies.

  • Use python3 -m pip install -r requirements.txt to replicate someone else’s virtual environment on your machine from the requirements.txt file.

Integrated Software Development Environments
  • An IDE is an application that provides a comprehensive set of facilities for software development, including syntax highlighting, code search and completion, version control, testing and debugging.

  • PyCharm recognises virtual environments configured from the command line using venv and pip.

Software Development Using Git and GitHub
  • A branch is one version of your project that can contain its own set of commits.

  • Feature branches enable us to develop / explore / test new code features without affecting the stable main code.

Python Code Style Conventions
  • Always assume that someone else will read your code at a later date, including yourself.

  • Community coding conventions help you create more readable software projects that are easier to contribute to.

  • Python Enhancement Proposals (or PEPs) describe a recommended convention or specification for how to do something in Python.

  • Style checking to ensure code conforms to coding conventions is often part of IDEs.

  • Consistency with the style guide is important - whichever style you choose.

Verifying Code Style Using Linters
  • Use linting tools on the command line (or via continuous integration) to automatically check your code style.

Section 2: Ensuring Correctness of Software at Scale
  • Using testing requires us to change our practice of code development, but saves time in the long run by allowing us to more comprehensively and rapidly find faults in code, as well as giving us greater confidence in the correctness of our code.

  • The use of test techniques and infrastructures such as parameterisation and Continuous Integration can help scale and further automate our testing process.

Automatically Testing Software
  • The three main types of automated tests are unit tests, functional tests and regression tests.

  • We can write unit tests to verify that functions generate expected output given a set of specific inputs.

  • It should be easy to add or change tests, understand and run them, and understand their results.

  • We can use a unit testing framework like Pytest to structure and simplify the writing of tests in Python.

  • We should test for expected errors in our code.

  • Testing program behaviour against both valid and invalid inputs is important and is known as data validation.

Scaling Up Unit Testing
  • We can assign multiple inputs to tests using parametrisation.

  • It’s important to understand the coverage of our tests across our code.

  • Writing unit tests takes time, so apply them where it makes the most sense.

Continuous Integration for Automated Testing
  • Continuous Integration can run tests automatically to verify changes as code develops in our repository.

  • CI builds are typically triggered by commits pushed to a repository.

  • We need to write a configuration file to inform a CI service what to do for a build.

  • We can specify a build matrix to specify multiple platforms and programming language versions to test against

  • Builds can be enabled and configured separately for each branch.

  • We can run - and get reports from - different CI infrastructure builds simultaneously.

Diagnosing Issues and Improving Robustness
  • Unit testing can show us what does not work, but does not help us locate problems in code.

  • Use a debugger to help you locate problems in code.

  • A debugger allows us to pause code execution and examine its state by adding breakpoints to lines in code.

  • Use preconditions to ensure correct behaviour of code.

  • Ensure that unit tests check for edge and corner cases too.

  • Using linting tools to automatically flag suspicious programming language constructs and stylistic errors can help improve code robustness.

Section 3: Software Development as a Process
  • Software engineering takes a wider view of software development beyond programming (or coding).

  • Ensuring requirements are sufficiently captured is critical to the success of any project.

  • Following a process makes development predictable, can save time, and helps ensure each stage of development is given sufficient consideration before proceeding to the next.

Software Requirements
  • When writing software used for research, requirements will almost always change.

  • Consider non-functional requirements (how the software will behave) as well as functional requirements (what the software is supposed to do).

  • The environment in which users run our software has an effect on many design choices we might make.

  • Consider the expected longevity of any code before you write it.

  • The perspective and language of a particular requirement stakeholder group should be reflected in requirements for that group.

Software Architecture and Design
  • Planning software projects in advance can save a lot of effort and reduce ‘technical debt’ later - even a partial plan is better than no plan at all.

  • By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. Such components can be as small as a single function, or be a software package in their own right.

  • When writing software used for research, requirements will almost always change.

  • ‘Good code is written so that is readable, understandable, covered by automated tests, not over complicated and does well what is intended to do.’

Programming Paradigms
  • A software paradigm describes a way of structuring or reasoning about code.

  • Different programming languages are suited to different paradigms.

  • Different paradigms are suited to solving different classes of problems.

  • A single piece of software will often contain instances of multiple paradigms.

Functional Programming
  • Functional programming is a programming paradigm where programs are constructed by applying and composing smaller and simple functions into more complex ones (which describe the flow of data within a program as a sequence of data transformations).

  • In functional programming, functions tend to be pure - they do not exhibit side-effects (by not affecting anything other than the value they return or anything outside a function). Functions can also be named, passed as arguments, and returned from other functions, just as any other data type.

  • MapReduce is an instance of a data generation and processing approach, in particular suited for functional programming and handling Big Data within parallel and distributed environments.

  • Python provides comprehensions for lists, dictionaries, sets and generators - a concise (if not strictly functional) way to generate new data from existing data collections while performing sophisticated mapping, filtering and conditional logic on original dataset’s members.

Object Oriented Programming
  • Object oriented programming is a programming paradigm based on the concept of classes, which encapsulate data and code.

  • Classes allow us to organise data into distinct concepts.

  • By breaking down our data into classes, we can reason about the behaviour of parts of our data.

  • Relationships between concepts can be described using inheritance (is a) and composition (has a).

Architecture Revisited: Extending Software
  • By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. Such components can be as small as a single function, or be a software package in their own right.

Section 4: Collaborative Software Development for Reuse
  • Agreeing on a set of best practices within a software development team will help to improve your software’s understandability, extensibility, testability, reusability and overall sustainability.

Developing Software In a Team: Code Review
  • Code review is a team software quality assurance practice where team members look at parts of the codebase in order to improve their code’s readability, understandability, quality and maintainability.

  • It is important to agree on a set of best practices and establish a code review process in a team to help to sustain a good, stable and maintainable code for many years.

Preparing Software for Reuse and Release
  • The reuse battle is won before it is fought. Select and use good practices consistently throughout development and not just at the end.

Packaging Code for Release and Distribution
  • Poetry allows us to produce an installable package and upload it to a package repository.

  • Making our software installable with Pip makes it easier for others to start using it.

  • For complete control over building a package, we can use a setup.py file.

Section 5: Managing and Improving Software Over Its Lifetime
  • For software to succeed it needs to be managed as well as developed.

  • Estimating the effort to deliver work items is a foundational tool for prioritising that work.

Managing a Collaborative Software Project
  • We should use GitHub’s Issues to keep track of software problems and other requests for change - even if we are the only developer and user.

  • GitHub’s Mentions play an important part in communicating between collaborators and is used as a way of alerting team members of activities and referencing one issue/pull requests/comment/commit from another.

  • Without a good project and issue management framework, it can be hard to keep track of what’s done, or what needs doing, and particularly difficult to convey that to others in the team or sharing the responsibilities.

Assessing Software for Suitability and Improvement
  • It’s as important to have a critical attitude to adopting software as we do to developing it.

  • As a team agree on who and to what extent you will support software you make available to others.

Software Improvement Through Feedback
  • Prioritisation is a key tool in academia where research goals can change and software development is often given short shrift.

  • In order to prioritise things to do we must first estimate the effort required to do them.

  • For accurate effort estimation, it should be done by the people who will actually do the work.

  • Aim to reduce cognitive biases in effort estimation by being honest about your abilities.

  • Ask other team members - or do estimation as a team - to help make accurate estimates.

  • MoSCoW is a useful technique for prioritising work to help ensure projects deliver successfully.

  • Aim for a 60%/20%/20% ratio of Must Haves/Should Haves/Could Haves for requirements within a timebox.

Wrap-up
  • Collaborative techniques and tools play an important part of research software development in teams.

Glossary