Summary and Schedule

This is a lesson for learning how to create scientific data visualizations in R. The lesson uses examples from charts created by Black social scientist W.E.B. Du Bois and a diverse team of collaborators for the 1900 Paris World Expo.

The Du Bois team’s visualizations were state of the art in 1900, using most of the major chart types still employed across the sciences and engineering today. The Du Bois charts provided some of the first widely accessible scientific analyses to refute false, biologically based theories of racial inequality. For their innovative analyses, beauty, and historical importance, the original hand drawn charts are preserved in the U.S. Library of Congress.

The lesson is designed for learners with little or no programming experience. The entire lesson can be taught in a two day workshop. Alternatively, selected episodes from the lesson can be taught over a period of around 3 hours of lecture and lab within a STEM course (see Instructor Notes for suggested lesson plans). The lesson starts with the social and scientific context in which Du Bois created the visualizations. The lesson then introduces key chart types for different kinds of data and visualization best practices. Multiple episodes are then offered for recreating or adapting selected Du Bois charts with modern data, including bar graphs and statistical maps.

This lesson part of a series of STEM Data Visualization and Du Boisian Methods modules for learning data visualization in R, Python, Stata, and more. Development and testing of the modules is funded by the National Science Foundation.

Overall Lesson Learning Objectives


Each episode of the lesson has more specific learning objectives. The overall lesson learning objectives are:

  • Understand the social and scientific context of visualizations as a process of scientific discovery (observation, hypothesis formation, data collection, analysis).
  • Practice creative and visual thinking as valuable methods for scientific discovery and communication.
  • Comprehend major chart types (pie bar, bar chart, line chart, statistical map) and their suitability for different levels of measurement and multivariate relationships.
  • Apply visualization best including accessible design and data story telling.
  • Create and modify a Du Bois chart using R to experience the value of coding to reproduce and adapt data visualizations in STEM.
Prerequisite

Getting started

This lesson assumes no prior knowledge of the skills or tools.

There are three ways to engage with the coding portions of this lesson:

  1. Use Jupyter Notebooks with an R Kernel on the Du Bois Cloud Jupyter Hub.
  2. Use Jupyter Notebooks with an R Kernel on students’ own computers.
  3. Use R Studio on students’ own computers.

The Du Bois Cloud option can be done with any web browser and requires no installation. We recommend that option if students are not expecting to do further data visualization using R for their courses or research.

Prerequisites for using students’ own computers

For options 2 and 3 using students’ own computers, follow the directions in the “Setup”.

For option 2, students will need Jupyter Lab, R, and a Jupyter R Kernel.

For option 3, students will need and R and RStudio.
To most effectively use these materials, please make sure to install everything before working through this lesson.

Resources 1. R-Social Sci Index Reference - https://github.com/datacarpentry/r-socialsci/blob/main/index.md

  1. Component Guide for Building Out Formatting Boxes for GitHub

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

  • The main goal here is to help the learners be comfortable with the RStudio interface.
  • Go very slowly in the “Getting set up” section. Make sure everyone is following along (remind learners to use the stickies). Plan with the helpers at this point to go around the room, and be available to help. It’s important to make sure that learners are in the correct working directory, and that they create a data (all lowercase) subfolder.

Data Sets


Download the data zip file and unzip it to your Desktop

Setup instructions


Coding exercises for this lesson can either be done 1) on your own computer using R and R studio, or 2) using a Jupyter Notebook with an R Kernel on the Du Bois Cloud.

Using Notebooks on the Du Bois cloud does not require any set up or installation. You can simply follow hyperlinks provided in this lesson that will open a Notebook in any web browser.

For those who want to do more data visualization and work with R beyond this lesson, we recommend using R and R studio. If you have not yet installed R and R studio, you can do so by following the instructions below.

R and RStudio are separate downloads and installations. R is the underlying statistical computing environment, but using R alone is no fun. RStudio is a graphical integrated development environment (IDE) that makes using R much easier and more interactive. You need to install R before you install RStudio. Once installed, because RStudio is an IDE, RStudio will run R in the background. You do not need to run it separately.

After installing both programs, you will need to install the tidyverse package from within RStudio. The tidyverse package is a powerful collection of data science tools within R see the tidyverse website for more details. Follow the instructions below for your operating system, and then follow the instructions to install tidyverse.

Windows

If you already have R and RStudio installed

  • Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
  • To check which version of R you are using, start RStudio and the first thing that appears in the console indicates the version of R you are running. Alternatively, you can type sessionInfo(), which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, you can update R using the installr package, by running:

R

if( !("installr" %in% installed.packages()) ){install.packages("installr")}
installr::updateR(TRUE)

If you don’t have R and RStudio installed

  • Download R from the CRAN website.
  • Run the .exe file that was just downloaded.
  • Go to the RStudio download page.
  • Under Installers select RStudio x.yy.zzz - Windows. Vista/7/8/10 (where x, y, and z represent version numbers).
  • Double click the file to install it.
  • Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.

macOS

If you already have R and RStudio installed

  • Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
  • To check the version of R you are using, start RStudio and the first thing that appears on the terminal indicates the version of R you are running. Alternatively, you can type sessionInfo(), which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, please download and install it. In any case, make sure you have at least R 3.2.

If you don’t have R and RStudio installed

  • Download R from the CRAN website.
  • Select the .pkg file for the latest R version.
  • Double click on the downloaded file to install R.
  • It is also a good idea to install XQuartz (needed by some packages).
  • Go to the RStudio download page.
  • Under Installers select RStudio x.yy.zzz - Mac OS X 10.6+ (64-bit) (where x, y, and z represent version numbers).
  • Double click the file to install RStudio.
  • Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.

Linux

  • Follow the instructions for your distribution from CRAN, they provide information to get the most recent version of R for common distributions. For most distributions, you could use your package manager (e.g., for Debian/Ubuntu run sudo apt-get install r-base, and for Fedora sudo yum install R), but we don’t recommend this approach as the versions provided by this approach are usually out of date. In any case, make sure you have at least R 3.2.
  • Go to the RStudio download page.
  • Under Installers select the version that matches your distribution, and install it with your preferred method (e.g., with Debian/Ubuntu sudo dpkg -i rstudio-x.yy.zzz-amd64.deb at the terminal).
  • Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
  • Before installing the tidyverse package, Ubuntu (and related) users may need to install the following dependencies: libcurl4-openssl-dev libssl-dev libxml2-dev (e.g. sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev).

For everyone

After installing R and RStudio, you need to install the tidyverse and here packages.

  • After starting RStudio, at the console type: install.packages("tidyverse") followed by the enter key. Once this has installed, type install.packages("here") followed by the enter key. Both packages should now be installed.

  • For reference, the lesson uses SAFI_clean.csv. The direct download link for this file is: https://github.com/datacarpentry/r-socialsci/blob/main/episodes/data/SAFI_clean.csv. This data is a slightly cleaned up version of the SAFI Survey Results available on figshare. Instructions for downloading the data with R are provided in the Before we start episode.

  • The json episode uses SAFI.json. The file is available on GitHub here.