Scientific reproducibility: What is it for?
|
Reproducible research is key for scientific advancement.
RStudio can help you to organize, have better control over, and produce reproducible research.
|
Good Practices for Managing Projects in RStudio
|
Use best practices for file and folder organization. This includes using relative file paths as opposed to complete file paths.
Make sure that all data are backed up on multiple devices and that you treat raw data as read-only.
We can use Git and GitHub to keep track of what we’ve done in the past and what we plan to do in the future.
Rproj files are pivotal to keeping everything bundled and organized.
|
Navigating RStudio and Quarto Documents
|
RStudio has four panels to organize your code and environment.
Manage packages in RStudio using specific functions.
Quarto documents combine text and code.
|
Working with projects in RStudio
|
R Studio has Git version control functionality built in.
Forking a GitHub repository makes a copy of the repository into your personal account on GitHub.
You can clone a git repository from Github to your local disk using RStudio.
For this workshop, each learner will work with their own fork of the “R-Repro-pub” repository.
|
Introduction to Working with Quarto documents
|
Quarto lets you create reproducible documents.
An qmd file is comprised of a YAML header, formatted text in qmd, and code blocks.
The render function converts the file into the chosen output format.
|
Writing and Styling Qmd Documents
|
The visual editor has made formatting much easier.
You can apply Qmd styling without prior Quarto knowledge.
You can include inline code to narratives for basic calculations and dynamic information.
|
Adding Code to Quarto Documents
|
Knitr will render your code and markdown-formatted text and output your document format of choice.
Code chunks are runnable pieces of R code.
Setting your working directory at the project level can effectively mitigate path-related challenges encountered while working on Quarto documents.
|
Rendering & Customizing Code Outputs
|
Each time you render/knit the document, calculations and plots will run and be displayed.
Options for code chunks can be set at the document level.
|
Advanced Code Chunk Options
|
Learn how to externally source code source()
Learn how to modularize your code to make it more reproducible
Use a chunk at the beginning of your document to load libraries and data to make your document more efficient.
|
Bibliography, Citations & Cross-Referencing
|
Rstudio supports different lookup strategies to make the citation process easier.
Rstudio supports different citation styles.
The YAML can be adjusted to display uncited items in the reference list.
Use bookdown to cross-reference content.
|
Using Git in RStudio
|
|
Collaborating via GitHub
|
Setting up R Studio to authenticate with GitHub using a Personal Authentication Token (PAT).
Setting the Git repository Origin in your R Studio project enables pushing and pulling from your local copy of the repository to the repository on GitHub.
|
Managing Dependencies in R/RStudio
|
Run the sessionInfo() function to take a snapshot of your computational environment.
Groundhog is a handy tool for capturing your project’s package dependencies.
Make your R scripts reproducible by replacing library(pkg) with groundhog.library(pkg,date) .
|
Publishing your project
|
You may choose to share and publish your data project before publishing its associated manuscript.
Sharing the code, data, and documentation is necessary to allow for inspection and research reproducibility.
The quarto-journals GitHub organization has journal article formats available for use.
|
Creating and sharing reproducible environments with renv
|
renv is a handy tool for capturing your project’s package dependencies
renv creates a JSON lock file which documents dependencies and let users restore the original versions used for a particular project
|