Introduction
|
Tidy data principles are essential to increase data analysis efficiency and code readability.
Using R and RStudio, it becomes easier to implement good practices in data analysis.
I can make my workflow more reproducible and collaborative by using git and Github.
|
R & RStudio, R Markdown
|
R and RStudio make a powerful duo to create R scripts and R Markdown notebooks.
RStudio offers a text editor, a console and some extra features (environment, files, etc.).
R is a functional programming language: everything resolves around functions.
R Markdown notebook support code execution, report creation and reproducibility of your work.
Literate programming is a paradigm to combine code and text so that it remains understandable to humans, not only to machines.
|
Visualizing data with ggplot2
|
ggplot2 relies on the grammar of graphics, an advanced methodology to visualise data.
ggplot() creates a coordinate system that you can add layers to.
You pass a mapping using aes() to link dataset variables to visual properties.
You add one or more layers (or geoms ) to the ggplot coordinate system and aes mapping.
Building a minimal plot requires to supply a dataset, mapping aesthetics and geometric layers (geoms).
ggplot2 offers advanced graphical visualisations to plot extra information from the dataset.
|
Data transformation with dplyr
|
The filter() function subsets a dataframe by rows.
The select() function subsets a dataframe by columns.
The mutate function creates new columns in a dataframe.
The group_by() function creates groups of unique column values.
This grouping information is used by summarize() to make new columns that define aggregate values across groupings.
The then operator %>% allows you to chain successive operations without needing to define intermediary variables for creating the most parsimonious, easily read analysis.
|
Data tidying with tidyr
|
The pivot_longer() function turns columns into rows (make a dataset tidy).
The pivot_wider() function turns rows into columns (make a dataset wide and more human readable).
Tidy dataset go hand in hand with ggplot2 plotting.
The complete function fills in implicitely missing observations (balance the number of observations).
|
Programming with R
|
An R script is a plain text file with an .R extension that you can execute.
Comments in an R script can be written with a # (hastag).
Loops allow you to automatize a series of similar actions.
Condition if/else helps you to control the execution of your R script.
|
Functional programming in R
|
A function in R consist of a name, one or several arguments, a body and an execution environment.
Functions can avoid code repetition and their associated mistake.
The name of a function should contain a verb to describe its action.
Vectorised operations allow to replace for loops and make your code more readable and maintanable.
|
Version control with git
|
In a version control system, file names do not reflect their versions.
git acts as a time machine for files in a given repository under version control.
git allows you to test changes and discard them if not relevant.
A new RStudio project can be smoothly integrated with git to allow you to version control scripts and other files.
|
Collaborating with you and others with Github
|
Github allows you to synchronise work efforts and collaborate with other scientists on (R) code.
Github can be used to make custom website visible on the internet.
Merge conflicts can arise between you and yourself (different machines).
Merge conflicts arise when you collaborate and are a safe way to handle discordance.
Efficient collaboration on data analysis can be made using Github.
|
Become a champion of open (data) science
|
Make your data and code available to others
Make your analyses reproducible
Make a sharp distincion between exploratory and confirmatory research
|