Summary and Setup

This is a lesson for learning how to create scientific data visualizations in R. The lesson uses examples from charts created by Black social scientist W.E.B. Du Bois and a diverse team of collaborators for the 1900 Paris World Expo.
The Du Bois team’s visualizations were state of the art in 1900, using most of the major chart types still employed across the sciences and engineering today. The Du Bois charts provided some of the first widely accessible scientific analyses to refute false, biologically based theories of racial inequality. For their innovative analyses, beauty, and historical importance, the original hand drawn charts are preserved in the U.S. Library of Congress.
The lesson is designed for learners with little or no programming experience. The entire lesson can be taught in a two day workshop. Alternatively, selected episodes from the lesson can be taught over a period of around 3 hours of lecture and lab within a STEM course.
This lesson part of a series of Learning STEM Data Visualization with Du Bois modules for learning data visualization in R, Python, Stata, and more. Development and testing of the modules is funded by the National Science Foundation.
Overall Lesson Learning Objectives
Each episode of the lesson has more specific learning objectives. The overall lesson learning objectives are:
- Identify how data visualization promotes scientific discovery.
- Appreciate early innovations in data visualization by W.E.B. Du Bois
and other Black and women scientists in his Atlanta University
Laboratory.
- Analyze the appropriate types of data visualization charts for
different kinds of measurements, relationships, and scientific
findings
- Engage in a creative process of data visualization in the style of W.E.B. Du Bois using R.
Getting started
This lesson assumes no prior knowledge of the skills or tools.
There are three ways to engage with the coding portions of this lesson:
- Use R Studio or or your preferred R editor on your own computer. Recommended for students who already use R or plan to continue doing so.
- Use a Jupyter Lite Du Bois Notebook with an R kernel in your web browser with no installation. Or use another Jupyter Lite tool with R Recommended for students who do not already use R buy may do so more in the future.
- Use JupyterLite guided tutorials at the Du Bois Module Site. Recommended when instructors wants beginner students to be able to do exercises independently with out an instructor present.
Data Sets
Links to data sets are provided within each episode to read them into R data frames directly from source urls.
Setup instructions for R Studio
For students using R Studio on their own computers, follow these directions in if you have not already installed R and R Studio.
R and RStudio are separate downloads and installations. R is the underlying statistical computing environment, but using R alone is no fun. RStudio is a graphical integrated development environment (IDE) that makes using R much easier and more interactive. You need to install R before you install RStudio. Once installed, because RStudio is an IDE, RStudio will run R in the background. You do not need to run it separately.
After installing both programs, you will need to install the
tidyverse package from within RStudio. The
tidyverse package is a powerful collection
of data science tools within R see the tidyverse
website for more details. Follow the instructions below for your
operating system, and then follow the instructions to install
tidyverse.
Windows
If you already have R and RStudio installed
- Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
- To check which version of R you are using, start RStudio and the
first thing that appears in the console indicates the version of R you
are running. Alternatively, you can type
sessionInfo(), which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, you can update R using theinstallrpackage, by running:
R
if( !("installr" %in% installed.packages()) ){install.packages("installr")}
installr::updateR(TRUE)
If you don’t have R and RStudio installed
- Download R from the CRAN website.
- Run the
.exefile that was just downloaded. - Go to the RStudio download page.
- Under Installers select RStudio x.yy.zzz - Windows. Vista/7/8/10 (where x, y, and z represent version numbers).
- Double click the file to install it.
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
macOS
If you already have R and RStudio installed
- Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
- To check the version of R you are using, start RStudio and the first
thing that appears on the terminal indicates the version of R you are
running. Alternatively, you can type
sessionInfo(), which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, please download and install it. In any case, make sure you have at least R 3.2.
If you don’t have R and RStudio installed
- Download R from the CRAN website.
- Select the
.pkgfile for the latest R version. - Double click on the downloaded file to install R.
- It is also a good idea to install XQuartz (needed by some packages).
- Go to the RStudio download page.
- Under Installers select RStudio x.yy.zzz - Mac OS X 10.6+ (64-bit) (where x, y, and z represent version numbers).
- Double click the file to install RStudio.
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
Linux
- Follow the instructions for your distribution from CRAN, they provide
information to get the most recent version of R for common
distributions. For most distributions, you could use your package
manager (e.g., for Debian/Ubuntu run
sudo apt-get install r-base, and for Fedorasudo yum install R), but we don’t recommend this approach as the versions provided by this approach are usually out of date. In any case, make sure you have at least R 3.2. - Go to the RStudio download page.
- Under Installers select the version that matches your
distribution, and install it with your preferred method (e.g., with
Debian/Ubuntu
sudo dpkg -i rstudio-x.yy.zzz-amd64.debat the terminal). - Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
- Before installing the
tidyversepackage, Ubuntu (and related) users may need to install the following dependencies:libcurl4-openssl-dev libssl-dev libxml2-dev(e.g.sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev).
For everyone
After installing R and RStudio, you need to install the
tidyverse and here packages.
After starting RStudio, at the console type:
install.packages("tidyverse")followed by the enter key. Once this has installed, typeinstall.packages("here")followed by the enter key. Both packages should now be installed.For reference, the lesson uses
SAFI_clean.csv. The direct download link for this file is: https://github.com/datacarpentry/r-socialsci/blob/main/episodes/data/SAFI_clean.csv. This data is a slightly cleaned up version of the SAFI Survey Results available on figshare. Instructions for downloading the data with R are provided in the Before we start episode.The json episode uses
SAFI.json. The file is available on GitHub here.