Summary and Schedule

This is a lesson for learning how to create scientific data visualizations in R. The lesson uses examples from charts created by Black social scientist W.E.B. Du Bois and a diverse team of collaborators for the 1900 Paris World Expo.

The Du Bois team’s visualizations were state of the art in 1900, using most of the major chart types still employed across the sciences and engineering today. The Du Bois charts provided some of the first widely accessible scientific analyses to refute false, biologically based theories of racial inequality. For their innovative analyses, beauty, and historical importance, the original hand drawn charts are preserved in the U.S. Library of Congress.

The lesson is designed for learners with little or no programming experience. The entire lesson can be taught in a two day workshop. Alternatively, selected episodes from the lesson can be taught over a period of around 3 hours of lecture and lab within a STEM course.

This lesson part of a series of Learning STEM Data Visualization with Du Bois modules for learning data visualization in R, Python, Stata, and more. Development and testing of the modules is funded by the National Science Foundation.

Overall Lesson Learning Objectives


Each episode of the lesson has more specific learning objectives. The overall lesson learning objectives are:

  • Identify how data visualization promotes scientific discovery.
  • Appreciate early innovations in data visualization by W.E.B. Du Bois and other Black and women scientists in his Atlanta University Laboratory.
  • Analyze the appropriate types of data visualization charts for different kinds of measurements, relationships, and scientific findings
  • Engage in a creative process of data visualization in the style of W.E.B. Du Bois using R.
Prerequisite

Getting started

This lesson assumes no prior knowledge of the skills or tools.

There are three ways to engage with the coding portions of this lesson:

  1. Use R Studio or or your preferred R editor on your own computer. Recommended for students who already use R or plan to continue doing so.
  2. Use a Jupyter Lite Du Bois Notebook with an R kernel in your web browser with no installation. Or use another Jupyter Lite tool with R Recommended for students who do not already use R buy may do so more in the future.
  3. Use JupyterLite guided tutorials at the Du Bois Module Site. Recommended when instructors wants beginner students to be able to do exercises independently with out an instructor present.

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

  • The main goal here is to help the learners be comfortable with the RStudio interface.
  • Go very slowly in the “Getting set up” section. Make sure everyone is following along (remind learners to use the stickies). Plan with the helpers at this point to go around the room, and be available to help. It’s important to make sure that learners are in the correct working directory, and that they crgeate a data (all lowercase) subfolder.

Data Sets


Links to data sets are provided within each episode to read them into R data frames directly from source urls.

Setup instructions for R Studio


For students using R Studio on their own computers, follow these directions in if you have not already installed R and R Studio.

R and RStudio are separate downloads and installations. R is the underlying statistical computing environment, but using R alone is no fun. RStudio is a graphical integrated development environment (IDE) that makes using R much easier and more interactive. You need to install R before you install RStudio. Once installed, because RStudio is an IDE, RStudio will run R in the background. You do not need to run it separately.

After installing both programs, you will need to install the tidyverse package from within RStudio. The tidyverse package is a powerful collection of data science tools within R see the tidyverse website for more details. Follow the instructions below for your operating system, and then follow the instructions to install tidyverse.

Windows

If you already have R and RStudio installed

  • Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
  • To check which version of R you are using, start RStudio and the first thing that appears in the console indicates the version of R you are running. Alternatively, you can type sessionInfo(), which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, you can update R using the installr package, by running:

R

if( !("installr" %in% installed.packages()) ){install.packages("installr")}
installr::updateR(TRUE)

If you don’t have R and RStudio installed

  • Download R from the CRAN website.
  • Run the .exe file that was just downloaded.
  • Go to the RStudio download page.
  • Under Installers select RStudio x.yy.zzz - Windows. Vista/7/8/10 (where x, y, and z represent version numbers).
  • Double click the file to install it.
  • Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.

macOS

If you already have R and RStudio installed

  • Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
  • To check the version of R you are using, start RStudio and the first thing that appears on the terminal indicates the version of R you are running. Alternatively, you can type sessionInfo(), which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, please download and install it. In any case, make sure you have at least R 3.2.

If you don’t have R and RStudio installed

  • Download R from the CRAN website.
  • Select the .pkg file for the latest R version.
  • Double click on the downloaded file to install R.
  • It is also a good idea to install XQuartz (needed by some packages).
  • Go to the RStudio download page.
  • Under Installers select RStudio x.yy.zzz - Mac OS X 10.6+ (64-bit) (where x, y, and z represent version numbers).
  • Double click the file to install RStudio.
  • Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.

Linux

  • Follow the instructions for your distribution from CRAN, they provide information to get the most recent version of R for common distributions. For most distributions, you could use your package manager (e.g., for Debian/Ubuntu run sudo apt-get install r-base, and for Fedora sudo yum install R), but we don’t recommend this approach as the versions provided by this approach are usually out of date. In any case, make sure you have at least R 3.2.
  • Go to the RStudio download page.
  • Under Installers select the version that matches your distribution, and install it with your preferred method (e.g., with Debian/Ubuntu sudo dpkg -i rstudio-x.yy.zzz-amd64.deb at the terminal).
  • Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
  • Before installing the tidyverse package, Ubuntu (and related) users may need to install the following dependencies: libcurl4-openssl-dev libssl-dev libxml2-dev (e.g. sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev).

For everyone

After installing R and RStudio, you need to install the tidyverse and here packages.

  • After starting RStudio, at the console type: install.packages("tidyverse") followed by the enter key. Once this has installed, type install.packages("here") followed by the enter key. Both packages should now be installed.

  • For reference, the lesson uses SAFI_clean.csv. The direct download link for this file is: https://github.com/datacarpentry/r-socialsci/blob/main/episodes/data/SAFI_clean.csv. This data is a slightly cleaned up version of the SAFI Survey Results available on figshare. Instructions for downloading the data with R are provided in the Before we start episode.

  • The json episode uses SAFI.json. The file is available on GitHub here.