Hi, my name is Sarah Brown and I’m a postdoctoral researcher in Data Science at Brown University.
I’m building an open curriculum to teach researchers to release their python code in ways that better support repeatability and collaboration. The lesson design follows the Carpentries style: learner centric, accessible and delivered with participatory live coding. The lesson will teach concepts of project organization, packaging, environments, documentation, and publishing. These are topics that many researchers won’t know: whether self-taught or focused on theory, these very practical aspects are not learned.
I hope my curriculum can move us from a state where a lot of research code is not released, and what much of is released is a set script that makes it easy to reproduce a paper’s result, but make it hard to compare a new technique or apply a method to a new dataset. To get there, I aim to fill a gap in training researchers with minimal practices they can adopt without requiring learning too many specialized tools. I found that a lot of documentation and tools for these concepts focus on bigger software projects and I think data analysis projects have slightly different nature and deserve their own support in the form of tutorials and conventions.
To date, I’ve built out an outline with learning objectives and started filling in some activities. I will pilot the workshop as a full day session at Brown University in February. I would welcome more activities and examples to use in the workshop and more thoughtful conversation about what a minimal set of open source practices for this purpose could look like. I plan to make the lesson more accessible via the Carpentries Lab program, when it launches.