This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to Open Data Science with R: Glossary

Key Points

Introduction
  • Tidy data principles are essential to increase data analysis efficiency and code readability.

  • Using R and RStudio, it becomes easier to implement good practices in data analysis.

  • I can make my workflow more reproducible and collaborative by using git and Github.

R & RStudio, R Markdown
  • R and RStudio make a powerful duo to create R scripts and R Markdown notebooks.

  • RStudio offers a text editor, a console and some extra features (environment, files, etc.).

  • R is a functional programming language: everything resolves around functions.

  • R Markdown notebook support code execution, report creation and reproducibility of your work.

  • Literate programming is a paradigm to combine code and text so that it remains understandable to humans, not only to machines.

Visualizing data with ggplot2
  • ggplot2 relies on the grammar of graphics, an advanced methodology to visualise data.

  • ggplot() creates a coordinate system that you can add layers to.

  • You pass a mapping using aes() to link dataset variables to visual properties.

  • You add one or more layers (or geoms) to the ggplot coordinate system and aes mapping.

  • Building a minimal plot requires to supply a dataset, mapping aesthetics and geometric layers (geoms).

  • ggplot2 offers advanced graphical visualisations to plot extra information from the dataset.

Data transformation with dplyr
  • The filter() function subsets a dataframe by rows.

  • The select() function subsets a dataframe by columns.

  • The mutate function creates new columns in a dataframe.

  • The group_by() function creates groups of unique column values.

  • This grouping information is used by summarize() to make new columns that define aggregate values across groupings.

  • The then operator %>% allows you to chain successive operations without needing to define intermediary variables for creating the most parsimonious, easily read analysis.

Data tidying with tidyr
  • The pivot_longer() function turns columns into rows (make a dataset tidy).

  • The pivot_wider() function turns rows into columns (make a dataset wide and more human readable).

  • Tidy dataset go hand in hand with ggplot2 plotting.

  • The complete function fills in implicitely missing observations (balance the number of observations).

Programming with R
  • An R script is a plain text file with an .R extension that you can execute.

  • Comments in an R script can be written with a # (hastag).

  • Loops allow you to automatize a series of similar actions.

  • Condition if/else helps you to control the execution of your R script.

Functional programming in R
  • A function in R consist of a name, one or several arguments, a body and an execution environment.

  • Functions can avoid code repetition and their associated mistake.

  • The name of a function should contain a verb to describe its action.

  • Vectorised operations allow to replace for loops and make your code more readable and maintanable.

Version control with git
  • In a version control system, file names do not reflect their versions.

  • git acts as a time machine for files in a given repository under version control.

  • git allows you to test changes and discard them if not relevant.

  • A new RStudio project can be smoothly integrated with git to allow you to version control scripts and other files.

Collaborating with you and others with Github
  • Github allows you to synchronise work efforts and collaborate with other scientists on (R) code.

  • Github can be used to make custom website visible on the internet.

  • Merge conflicts can arise between you and yourself (different machines).

  • Merge conflicts arise when you collaborate and are a safe way to handle discordance.

  • Efficient collaboration on data analysis can be made using Github.

Become a champion of open (data) science
  • Make your data and code available to others

  • Make your analyses reproducible

  • Make a sharp distincion between exploratory and confirmatory research

Glossary

FIXME