This lesson is in the early stages of development (Alpha version)

Life Sciences Workshop: Overview

Data Carpentry provides workshops on fundamental data concepts, with the aim of proving the required skills to get started with modern data analysis. This workshop uses a collection of results files from a typical lab-based plate-reader, as seen in most life sciences labs.

The workshop primarily involves spreadsheet software (such as Microsoft Excel) and R, but assumes no prior knowledge of the latter. Basic familiarity with spreadsheets is expected.

Note that although the dataset used throughout these lessons is scientific, the ideas and principles explored could easily be applied to other data analysis scenarios outside of a life sciences context.

Motivation

The ability to load data into Excel, perform calculations and create plots is widespread. However, this ‘mouse-click’ driven approach to analysis, combined, perhaps, with a lack of good data management principles, can lead to analytical results that are ambiguous, non-reproducible, and unusable in the long term.

By applying basic data management ideas, combined with the ability to load, explore and analyse data in a code-based environment like R, your research efforts will be significantly improved.

Structure

Day 1 is aimed at a general audience of anyone who uses spreadsheets for their work. The material does not cover how to use a spreadsheet in terms of calculations, loading data, etc, as such knowledge is generally widespread. Instead, the focus is on how to organise and arrange data files and the data within those files, in order to maximise the transparency and reproducibility of your work.

Day 2 is for those who wish to go beyond the use of spreadsheets for their data analysis, with the introduction of R. The basics are covered, leading to the fitting of a 4-parameter logistic regression (4PL) curve to typical enzyme-linked immunosorbent assay (ELISA) data.

Day 3 is an introduction to statistics using R, focused on ‘before and after’ scenarios, which are typical in life science scenarios (e.g. evaluating results before and after a reagent change).

Prerequisites

To get started, follow the directions in the Setup tab to get access to the required software and data for this workshop.

Schedule

Setup Download files required for the lesson
00:00 1. Data Management: Introduction How do most people organise their data and associated files?
What are some common issues with not having a clear plan for the naming and storage of files?
Why is this important? (reproducible research)
00:15 2. Data Management: File-naming and folder structures How should files be named?
How should folders be organised?
00:40 3. Data Management: Meta-data What is meta-data?
Why is it useful?
01:03 4. Data Management: Raw Data What is raw data?
Why is it important?
01:08 5. Data Management: Data Management Planning What is a data management plan and do I need one?
01:23 6. Spreadsheets: Introduction When should you use a spreadsheet?
01:43 7. Spreadsheets: Guiding Principles How should spreadsheets be used to maximise efficiency and reproducibility?
02:23 8. R introduction: Introduction Why bother to code?
02:38 9. R introduction: Tidy data What is tidy data?
03:13 10. R introduction: R and RStudio How do we use R?
03:33 11. R introduction: R fundamentals What are the basic features of R?
03:33 12. R introduction: Data Manipulation and Plotting How do we get going with real data?
05:08 13. R introduction: Beyond Base R How do we go beyond R’s base functionality?
06:03 14. Statistics in R What are some basic, core concepts in statistics?
How do you apply them in R?
How should you analyse ‘before and after’ results?
06:06 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.