Summary and Schedule

This is a lesson for learning how to create scientific data visualizations in R. The lesson uses examples from charts created by Black social scientist W.E.B. Du Bois and a diverse team of collaborators for the 1900 Paris World Expo.
The Du Bois team’s visualizations were state of the art in 1900, using most of the major chart types still employed across the sciences and engineering today. The Du Bois charts provided some of the first widely accessible scientific analyses to refute false, biologically based theories of racial inequality. For their innovative analyses, beauty, and historical importance, the original hand drawn charts are preserved in the U.S. Library of Congress.
The lesson is designed for learners with little or no programming experience. The entire lesson can be taught in a two day workshop. Alternatively, selected episodes from the lesson can be taught over a period of around 3 hours of lecture and lab within a STEM course.
This lesson part of a series of Learning STEM Data Visualization with Du Bois modules for learning data visualization in R, Python, Stata, and more. Development and testing of the modules is funded by the National Science Foundation.
Overall Lesson Learning Objectives
Each episode of the lesson has more specific learning objectives. The overall lesson learning objectives are:
- Identify how data visualization promotes scientific discovery.
- Appreciate early innovations in data visualization by W.E.B. Du Bois
and other Black and women scientists in his Atlanta University
Laboratory.
- Analyze the appropriate types of data visualization charts for
different kinds of measurements, relationships, and scientific
findings
- Engage in a creative process of data visualization in the style of W.E.B. Du Bois using R.
Getting started
This lesson assumes no prior knowledge of the skills or tools.
There are three ways to engage with the coding portions of this lesson:
- Use R Studio or or your preferred R editor on your own computer. Recommended for students who already use R or plan to continue doing so.
- Use a Jupyter Lite Du Bois Notebook with an R kernel in your web browser with no installation. Or use another Jupyter Lite tool with R Recommended for students who do not already use R buy may do so more in the future.
- Use JupyterLite guided tutorials at the Du Bois Module Site. Recommended when instructors wants beginner students to be able to do exercises independently with out an instructor present.
| Setup Instructions | Download files required for the lesson | |
| Duration: 00h 00m | 1. Data Visualization Now |
How can data visualization and creativity help answer important
scientific questions? Why did data visualization become predominant in the social sciences earlier than for physical and natural sciences? How did Du Bubois use data visualization to challenge false biological theories of racial inequality? How did team science help Du Bois’ team to create impactful visualizations for the 1900 Paris exposition? |
| Duration: 00h 20m | 2. Using R with R Studio |
How to find your way around RStudio? How to interact with R? How to manage your environment? How to install packages? |
| Duration: 01h 00m | 3. Reading and Interpreting STEM Charts |
What are the major STEM chart types, all used by Du Bois? What universal design practices can make charts more accessible and effective? How did Du Bois use these practices effectively in one of his charts? |
| Duration: 01h 20m | 4. Recreate a Du Bois Bar Chart |
How can I read tabular data to plot a bar graph in R? How can I use ggplot to organize and format a bar graph in R?How can I maintain a reproducible record of my data visualizations? How can I use color, text, and dimensions to change the aesthetics of a data visualization? |
| Duration: 01h 56m | 5. Adapt: Biodiversity and Redlining Bar Chart |
How can we adapt code for a previous bar chart to plot different
data? What can go wrong when we recycle code? How can we refine code to fit the particular data and relationships? |
| Duration: 02h 12m | 6. AI assisted plotting with R |
What are potential accessibility benefits of AI for data
visualization? How can you use AI in ways that improve your comprehension of visualizations and code? What are risks of using AI to help you do visualizations? |
| Duration: 02h 28m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
- The main goal here is to help the learners be comfortable with the RStudio interface.
- Go very slowly in the “Getting set up” section. Make sure everyone
is following along (remind learners to use the stickies). Plan with the
helpers at this point to go around the room, and be available to help.
It’s important to make sure that learners are in the correct working
directory, and that they crgeate a
data(all lowercase) subfolder.
Data Sets
Links to data sets are provided within each episode to read them into R data frames directly from source urls.
Setup instructions for R Studio
For students using R Studio on their own computers, follow these directions in if you have not already installed R and R Studio.
R and RStudio are separate downloads and installations. R is the underlying statistical computing environment, but using R alone is no fun. RStudio is a graphical integrated development environment (IDE) that makes using R much easier and more interactive. You need to install R before you install RStudio. Once installed, because RStudio is an IDE, RStudio will run R in the background. You do not need to run it separately.
After installing both programs, you will need to install the
tidyverse package from within RStudio. The
tidyverse package is a powerful collection
of data science tools within R see the tidyverse
website for more details. Follow the instructions below for your
operating system, and then follow the instructions to install
tidyverse.
Windows
If you already have R and RStudio installed
- Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
- To check which version of R you are using, start RStudio and the
first thing that appears in the console indicates the version of R you
are running. Alternatively, you can type
sessionInfo(), which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, you can update R using theinstallrpackage, by running:
R
if( !("installr" %in% installed.packages()) ){install.packages("installr")}
installr::updateR(TRUE)
If you don’t have R and RStudio installed
- Download R from the CRAN website.
- Run the
.exefile that was just downloaded. - Go to the RStudio download page.
- Under Installers select RStudio x.yy.zzz - Windows. Vista/7/8/10 (where x, y, and z represent version numbers).
- Double click the file to install it.
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
macOS
If you already have R and RStudio installed
- Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
- To check the version of R you are using, start RStudio and the first
thing that appears on the terminal indicates the version of R you are
running. Alternatively, you can type
sessionInfo(), which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, please download and install it. In any case, make sure you have at least R 3.2.
If you don’t have R and RStudio installed
- Download R from the CRAN website.
- Select the
.pkgfile for the latest R version. - Double click on the downloaded file to install R.
- It is also a good idea to install XQuartz (needed by some packages).
- Go to the RStudio download page.
- Under Installers select RStudio x.yy.zzz - Mac OS X 10.6+ (64-bit) (where x, y, and z represent version numbers).
- Double click the file to install RStudio.
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
Linux
- Follow the instructions for your distribution from CRAN, they provide
information to get the most recent version of R for common
distributions. For most distributions, you could use your package
manager (e.g., for Debian/Ubuntu run
sudo apt-get install r-base, and for Fedorasudo yum install R), but we don’t recommend this approach as the versions provided by this approach are usually out of date. In any case, make sure you have at least R 3.2. - Go to the RStudio download page.
- Under Installers select the version that matches your
distribution, and install it with your preferred method (e.g., with
Debian/Ubuntu
sudo dpkg -i rstudio-x.yy.zzz-amd64.debat the terminal). - Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
- Before installing the
tidyversepackage, Ubuntu (and related) users may need to install the following dependencies:libcurl4-openssl-dev libssl-dev libxml2-dev(e.g.sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev).
For everyone
After installing R and RStudio, you need to install the
tidyverse and here packages.
After starting RStudio, at the console type:
install.packages("tidyverse")followed by the enter key. Once this has installed, typeinstall.packages("here")followed by the enter key. Both packages should now be installed.For reference, the lesson uses
SAFI_clean.csv. The direct download link for this file is: https://github.com/datacarpentry/r-socialsci/blob/main/episodes/data/SAFI_clean.csv. This data is a slightly cleaned up version of the SAFI Survey Results available on figshare. Instructions for downloading the data with R are provided in the Before we start episode.The json episode uses
SAFI.json. The file is available on GitHub here.