Setting up a Project

Overview

Teaching: 0 min
Exercises: 0 min

Questions

How do I set up a project in practice?

What organization will help support the goals of my project?

What additional infrastructure will support opening my project

Objectives

Create a project structure

Save helper excerpts of code

Project Organization

Now that we’ve brainstormed the parts of a project and talked a little bit about what each of them consists of. How should we organize the code to help our future self and collaborators?

There isn’t a specific answer, but there are some guiding principles. There are also some packages that create a basic setup for you. These are helpful for getting started sometimes, if you are building something that follows a lot of standards, but do not help you reorganize your existing ode.

We will begin in this section talking about how to start from scratch, noting that often the reality is that you have code and want to organize and sort it to be more functional. We start from clean to give you the ideas and concepts, then we’ll return to how to sort and organize code into the bins we created.

Exercise

Let’s look around on GitHub for some examples and compare and contrast them.

Here are some ideas to consider:

Detecting Simpson’s Paradox

Wiggum

MAD-bayes

Questions

What files and directory structures are common?

Which ones do you think you could get started with right away?

What different goals do they seem to be organized for?

So next we think about how these ideas and which of these and talk about some specific advice in each topic.

File Naming

This is the least resistence step you can take to make your code more reusable. Naming things is an important aspect of programming. This Data Carpentry episode provides some useful principles for file naming.

These are the three main characteristics of a good file name:

Machine readable
Human readable
Plays well with default ordering

Guiding Principles

There are numerous resources on good practices for starting and developing your project, such as:

In this lesson, we are going to create a project that attempts to abide by the guiding principles presented in these resources.

Setting up a project

Sometimes we get to start from scratch. So we can set up everything from the beginning.

Templates

For some types of projects there are tools that generate the base structure for you. These tools are sometimes called “cookie cutters” or simply project templates. They are available in a variety of languages, and some examples include:

Shablona

NLeSC Python template

Cookiecutter command-line utility

For our lesson, we will be manually creating a small project. However, it will be similar to the examples above.

git clone
cd project
mkdir data
mkdir docs
mkdir experiments
mkdir package
touch setup.py
touch README.md

We will also have a .gitignore file and some files and folders that are not included. In general data is ignored, but scripts that download or process the data in some way, are good to keep. Results should be ignored.

Exercise

Make each of the following files in the project in the correct location by replacing the __ on each line

touch __/raw_data.csv # raw data for processing
touch __/generate_figures.py # functions to create figures for presentation/publication
touch __/new_technique.py # contains the novel method at the core of your publication
touch __/reproduce_paper.py # code to re-run the analyses reported in your methods paper about the package
touch __/helper_functions.py # auxilliary functions for routine tasks associated with the novel method
touch __/how_to_setup.md # details to help others prepare equivalent experiments to those presented in your paper

Solution

touch data/raw_data.csv
touch experiments/generate_figures.py
touch package/new_technique.py
touch experiments/reproduce_paper.py
touch package/helper_functions.py
touch docs/how_to_setup.md

Exercise

Label each of the following excerpts for where it goes in the project

excerpt 1

Getting Started
----------------

to install

excerpt 2

for data_file in file_list:
  proc_data = pkg.preprocess(data_file)
  proc_data.to_csv(data_file[:-3] + '_proc.csv')
  pkg.new_method(proc_data)

excerpt 3

df = pd.read_csv(data_file)
df.head()
df.describe()

excerpt 4

This technique involves the best new analysis technique ever
the background to understand the technique is these three things

Open Source Basics, MWE

Open source guidelines are generally written to be ready to scale. Here we propose the basics to get your project live and usable vs. things that will help if it grows and builds a community, but n

README

A README file is the first information about your project most people will see. It should encourage people to start using it and cover key steps in that process. It includes key information, such as:

What the project does
Why the project is useful
How users can get started with the project
Where users can get help with the project
Who maintains and contributes to the project
How to repeat the analysis (if it is a data project)

If you are not sure of what to put in your README, these bullet points are a good starting point. There are many resources on how to write good README files, such as Awesome README.

Exercise

Choose 2 README files from the Awesome README gallery examples or from projects that you regularly use and discuss with a group:

What are common sections?

What is the purpose of the file?

What useful information does it contain?

Licenses

As a creative work, software is subject to copyright. When code is published without a license describing the terms under which it can be used by others, all of the author’s rights are reserved by default. This means that no-one else is allowed to copy, re-use, or adapt the softwarewithout the express permission of the author. Such cases are surprisingly common but, if you want your methods to be useful to, and used by, other people you should make sure to include a license to tell them how you want them to do this.

Choosing a license for your software can be intimidating and confusing, and you should make sure you feel well-informed before you do so. This lesson and the paper linked from it provide more information about why licenses are important, which are in common use for research software, and what you might consider when choosing one for your own project. Choosealicense.com is another a helpful tool to guide you through this process.

Exercise

Using the resources linked above, compare the terms of the following licenses:

MIT

GPL

a proprietary license

What do you think are the benefits and drawbacks of each with regards to research software?

Discuss with a partner before sharing your thoughts with the rest of the group.

Open Source, Next Steps

Other common components are

code of conduct
contributing guidelines
citation

Even more advanced for building a community

issue templates
pull request templates
pathways and personas

For training and mentoring see Mozilla Open Leaders. For reading, check out the curriculum.

Re-organizing a project

Practice working on projects

FIXME: provide a example project folder, spend time sorting, or allow people some time to work on their own projects and generating questions.

Key Points

Data and code should be governed by different principles

A package enables a project to be installed

An environment allows different people to all have the same versions and run software more reliably

Documentation is an essential component of nay complete project and should exist with the code

previous episode

Packaging and Publishing with Python

next episode

Setting up a Project

Overview

Project Organization

Exercise

File Naming

Guiding Principles

Setting up a project

Templates

Exercise

Solution

Exercise

Open Source Basics, MWE

README

Exercise

Licenses

Exercise

Open Source, Next Steps

Re-organizing a project

Practice working on projects

Key Points

previous episode

next episode