Summary and Setup

This is a lesson for learning how to create scientific data visualizations in R. The lesson uses examples from charts created by Black social scientist W.E.B. Du Bois and a diverse team of collaborators for the 1900 Paris World Expo.
The Du Bois team’s visualizations were state of the art in 1900, using most of the major chart types still employed across the sciences and engineering today. The Du Bois charts provided some of the first widely accessible scientific analyses to refute false, biologically based theories of racial inequality. For their innovative analyses, beauty, and historical importance, the original hand drawn charts are preserved in the U.S. Library of Congress.
The lesson is designed for learners with little or no programming experience. The entire lesson can be taught in a two day workshop. Alternatively, selected episodes from the lesson can be taught over a period of around 3 hours of lecture and lab within a STEM course (see Instructor Notes for suggested lesson plans). The lesson starts with the social and scientific context in which Du Bois created the visualizations. The lesson then introduces key chart types for different kinds of data and visualization best practices. Multiple episodes are then offered for recreating or adapting selected Du Bois charts with modern data, including bar graphs and statistical maps.
This lesson part of a series of STEM Data Visualization and Du Boisian Methods modules for learning data visualization in R, Python, Stata, and more. Development and testing of the modules is funded by the National Science Foundation.
Overall Lesson Learning Objectives
Each episode of the lesson has more specific learning objectives. The overall lesson learning objectives are:
- Understand the social and scientific context of visualizations as a process of scientific discovery (observation, hypothesis formation, data collection, analysis).
- Practice creative and visual thinking as valuable methods for scientific discovery and communication.
- Comprehend major chart types (pie bar, bar chart, line chart, statistical map) and their suitability for different levels of measurement and multivariate relationships.
- Apply visualization best including accessible design and data story telling.
- Create and modify a Du Bois chart using R to experience the value of coding to reproduce and adapt data visualizations in STEM.
Getting started
This lesson assumes no prior knowledge of the skills or tools.
There are three ways to engage with the coding portions of this lesson:
- Use Jupyter Notebooks with an R Kernel on the Du Bois Cloud Jupyter Hub.
- Use Jupyter Notebooks with an R Kernel on students’ own computers.
- Use R Studio on students’ own computers.
The Du Bois Cloud option can be done with any web browser and requires no installation. We recommend that option if students are not expecting to do further data visualization using R for their courses or research.
Prerequisites for using students’ own computers
For options 2 and 3 using students’ own computers, follow the directions in the “Setup”.
For option 2, students will need Jupyter Lab, R, and a Jupyter R Kernel.
For option 3, students will need and R and
RStudio.
To most effectively use these materials,
please make sure to install everything before working through
this lesson.
Resources 1. R-Social Sci Index Reference - https://github.com/datacarpentry/r-socialsci/blob/main/index.md
- Component Guide for Building Out Formatting Boxes for GitHub
Data Sets
Download the data zip file and unzip it to your Desktop
Setup instructions
Coding exercises for this lesson can either be done 1) on your own computer using R and R studio, or 2) using a Jupyter Notebook with an R Kernel on the Du Bois Cloud.
Using Notebooks on the Du Bois cloud does not require any set up or installation. You can simply follow hyperlinks provided in this lesson that will open a Notebook in any web browser.
For those who want to do more data visualization and work with R beyond this lesson, we recommend using R and R studio. If you have not yet installed R and R studio, you can do so by following the instructions below.
R and RStudio are separate downloads and installations. R is the underlying statistical computing environment, but using R alone is no fun. RStudio is a graphical integrated development environment (IDE) that makes using R much easier and more interactive. You need to install R before you install RStudio. Once installed, because RStudio is an IDE, RStudio will run R in the background. You do not need to run it separately.
After installing both programs, you will need to install the
tidyverse
package from within RStudio. The
tidyverse
package is a powerful collection
of data science tools within R see the tidyverse
website for more details. Follow the instructions below for your
operating system, and then follow the instructions to install
tidyverse
.
Windows
If you already have R and RStudio installed
- Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
- To check which version of R you are using, start RStudio and the
first thing that appears in the console indicates the version of R you
are running. Alternatively, you can type
sessionInfo()
, which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, you can update R using theinstallr
package, by running:
R
if( !("installr" %in% installed.packages()) ){install.packages("installr")}
installr::updateR(TRUE)
If you don’t have R and RStudio installed
- Download R from the CRAN website.
- Run the
.exe
file that was just downloaded. - Go to the RStudio download page.
- Under Installers select RStudio x.yy.zzz - Windows. Vista/7/8/10 (where x, y, and z represent version numbers).
- Double click the file to install it.
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
macOS
If you already have R and RStudio installed
- Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
- To check the version of R you are using, start RStudio and the first
thing that appears on the terminal indicates the version of R you are
running. Alternatively, you can type
sessionInfo()
, which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, please download and install it. In any case, make sure you have at least R 3.2.
If you don’t have R and RStudio installed
- Download R from the CRAN website.
- Select the
.pkg
file for the latest R version. - Double click on the downloaded file to install R.
- It is also a good idea to install XQuartz (needed by some packages).
- Go to the RStudio download page.
- Under Installers select RStudio x.yy.zzz - Mac OS X 10.6+ (64-bit) (where x, y, and z represent version numbers).
- Double click the file to install RStudio.
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
Linux
- Follow the instructions for your distribution from CRAN, they provide
information to get the most recent version of R for common
distributions. For most distributions, you could use your package
manager (e.g., for Debian/Ubuntu run
sudo apt-get install r-base
, and for Fedorasudo yum install R
), but we don’t recommend this approach as the versions provided by this approach are usually out of date. In any case, make sure you have at least R 3.2. - Go to the RStudio download page.
- Under Installers select the version that matches your
distribution, and install it with your preferred method (e.g., with
Debian/Ubuntu
sudo dpkg -i rstudio-x.yy.zzz-amd64.deb
at the terminal). - Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
- Before installing the
tidyverse
package, Ubuntu (and related) users may need to install the following dependencies:libcurl4-openssl-dev libssl-dev libxml2-dev
(e.g.sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev
).
For everyone
After installing R and RStudio, you need to install the
tidyverse
and here
packages.
After starting RStudio, at the console type:
install.packages("tidyverse")
followed by the enter key. Once this has installed, typeinstall.packages("here")
followed by the enter key. Both packages should now be installed.For reference, the lesson uses
SAFI_clean.csv
. The direct download link for this file is: https://github.com/datacarpentry/r-socialsci/blob/main/episodes/data/SAFI_clean.csv. This data is a slightly cleaned up version of the SAFI Survey Results available on figshare. Instructions for downloading the data with R are provided in the Before we start episode.The json episode uses
SAFI.json
. The file is available on GitHub here.