Recreate a Du Bois Bar Chart

Last updated on 2026-04-16 | Edit this page

Overview

Questions

  • How can I read tabular data to plot a bar graph in R?
  • How can I use ggplot to organize and format a bar graph in R?
  • How can I maintain a reproducible record of my data visualizations?
  • How can I use color, text, and dimensions to change the aesthetics of a data visualization?

Objectives

  • Create bar graph variations based on historical tabular data.
  • Develop basic R code to create bar graphs.
  • Use transparent, legible, and shareable record of how you created your own unique data visualizations.
  • Differentiate and use the Du Bois theme in making data visuals.
Callout

This interactive exercise is inspired by the annual #DuBoisChallenge. The #DuBoisChallenge is a call to scientists, students, and community members to recreate, adapt, and share on social media the data visualzations created by W.E.B. Du Bois and his collaborators in 1900. Before doing the interactive exercise, please read this article about the Du Bois Challenge.

Black Literacy After Emancipation


Plate 47

Presentation of the historical cross-national literacy data


Du Bois’ created many data portraits to share the story of Black Americans post-emancipation. Plate 47 also known as the “Black Literacy After Emancipation” graphic at the top of this page shows mass education as one important strategy for furthering and deepening emancipation for Black Americans and others. In this workbook, you will recreate Du Bois’ visualization of Black illiteracy rates in the US compared to illiteracy rates in other countries using 1900 data.

An important context of Du Bois’s graph of Black illiteracy is that literacy was illegal for enslaved people in the U.S. until emancipation and the Confederacy’s defeat during the Civil War. Illiteracy then declined rapidly as Black Americans sought to empower themselves through education. Du Bois plotted this decline in illiteracy among Black residents in the state of Georgia in the figure below. This graph used decennial US census illiteracy rates for Georgia from 1860 to 1890 that are available here. They likely wrote “50%?” for the 1900 illiteracy rate because the Census did not publish 1900 illiteracy rates (available here) until several months after the Paris Exposition.

Plate 14

How can I input tabular data to plot a bar graph in R?


The first step for data visualization in R is to read data into an R dataframe. This is like double clicking a file to open it in other computer programs. But with R, we use code.

For this exercise, we’re going to read in data from a website. And we’re going to place the data into a dataframe named d_literacy_country.

There is no record of the exact data used by the Du Bois team for this bar graph. And the Du Bois graph curiously does not include tick marks with a labeled axis scale to show what exact values each bar represents. Why? Perhaps the Du Bois team wanted to emphasize that the bar graph was a rough comparison of illiteracy rates because of varied timing, methods, and national boundaries for measuring illiteracy rates at the time. The length of the “Negroes U.S.A” bar likely represents the national Black illiteracy rate of 57.1% reported by the 1890 US Census (see reported “Russie” (Russia) bar correspond to the national US Black illiteracy rate in the 1890 US Census (see here)). So our data derives illiteracy rates for other countries based on the length of each country’s bar relative to “Negroes U.S.A.” bar, presuming the “Negroes U.S.A.” bar represents 57.1%.

The R code to read in this data uses an <- arrow pointed at the name of the data frame and the read.csv() function command followed by the web address within parentheses where a csv (comma separated values) data file is located. It looks like this:

R

df <- read.csv("web_address_with_data.csv")

After writing this code, we can write the name of the data frame

R

df

Typing just the name of the dataframe will list all of the data in the data frame.

Challenge

Challenge 1: Reading the data

The web address of the data is: https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv

Update the code below using the web address above.

d_literacy_country <- read.csv("web_address_with_data/data_file_name.csv")

d_literacy_country

R

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

d_literacy_country

OUTPUT

           country illiteracy
1         Roumanie    73.3216
2           Servie    72.6727
3           Russie    72.6727
4  Negroes, U.S.A.    57.1000
5          Hongrie    55.8023
6           Italie    44.1227
7         Autriche    35.0386
8          Ireland    25.3057
9           France    12.9773
10           Suede     0.6489

Recreating a Bar Graph


After successfully reading the data above, you should be able to see that the data has two columns.

d_literacy_country

column_name Description
country Name of country
illiteracy illiteracy percentage

Each column is a variable:

  • country is a country name for 10 countries and with Black people in the U.S. treated as a country.

  • illiteracy contains percent of people in each country who are illiterate.

Before creating a bar graph of the data, we need to read the library ggplot2 and set up a couple parameters.

R

library(ggplot2)

The line of code opens the library of ggplot2.

The following code creates a simple bar graph using ggplot2 where df represents the dataframe.

R

ggplot(df, aes(x=horizon_variable, y=vertical_variable)) +
  geom_col()

R

library(ggplot2)

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

ggplot(d_literacy_country, aes( 
  x = country, 
  y = illiteracy)) +
  geom_col()

Within this expression, we set the parameters of which variable to be placed across the horizontal (using x=variable) and vertical axes (using y=variable). Typically, bar graphs have categories on the horizontal axis and the values on the vertical axis. However, there are instances where we want to create a horizontal bar graph where categorical values are on the vertical axis and the values are on the horizontal axis. Which bar graph did Du Bois used in Plate 47 presented at the top of the page?

Challenge

Challenge 2: Horizonal Bar graph

Below is the code to make a traditional bar graph. How can you modify the code in order to make it a horizontal bar graph?

library(ggplot2)

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

ggplot(d_literacy_country, aes(
  x = country,
  y = illiteracy)) +
  geom_col()

R

library(ggplot2)

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

ggplot(d_literacy_country, aes(
    x = illiteracy,
    y = country)) +
    geom_col() 

Now, our plot is starting to look more like the Du Bois original data creation. In the code from challenge 2, the categories are on x-axis (horizontal) and the rates/numbers are expressed on the y-axis (vertical axis). In the challenge above, we use the ggplot function to plot a horizontal bar graph of illiteracy rates across the observations (countries and Black Americans).

In the challenge code above, we add a ggplot function followed by open parentheses ( to tell R that we will plot data from the d_literacy_country data frame with an “aesthetic mapping” aes() specification that maps one column of data on the x axis and another column of data on the y axis.

After the close parentheses ) that tells ggplot we want to plot d_literacy_country data with one variable on the x axis, and another on the y axis, we add a + notation. When using multi-line code with ggplot, the + tells R we have more code to read. Specifically here, the geom_col() function tells ggplot we want a bar graph based on summary statistics in the dataframe.

Sorting the Values & Highlighting a value


In the bar graph you created above, can you tell what order the bars for each country are sorted by?…

It is ordered in alphabetical order. When we observe Du Bois Plate 47, we see Du Bois sorts the bar by illiteracy rate from highest to lowest.

Additionally, Du Bois also graphs illiteracy rate for Black Americans in a different color to make it easier to compare to other countries.

R

ggplot(d_literacy_country, aes( 
  x = illiteracy, 
  y = reorder(country, illiteracy))) +
    geom_col()

Compare the code above with challenge 2, what do you notice? The code above uses the reorder function to reorganize the country names on the vertical axis based on illiteracy.

Next, we the fill function to highlight a specific country: Ireland.

R

ggplot(d_literacy_country, aes( 
  x = illiteracy, 
  y = reorder(country, illiteracy),
  fill = country == "Ireland"
  )) +
  geom_col()
Challenge

Challenge 3: Reorder and highlighting a value

Observing Plate 47, what country or value is suppose to stand out? Update the below.

library(ggplot2)

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

ggplot(d_literacy_country, aes(
  x = illiteracy,
  y = reorder(country, illiteracy),
  fill = country == "Ireland"
  )) +
  geom_col()

R

library(ggplot2)

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

ggplot(d_literacy_country, aes( 
  x = illiteracy, 
  y = reorder(country, illiteracy),
  fill = country == "Negroes, U.S.A."
  )) +
  geom_col()

Bar width and bar colors


We can easily edit the bar width by inputting numeric values within the

R

geom_col(width = #)

Below is the code with width .1.

R

ggplot(d_literacy_country, aes( 
  x = illiteracy, 
  y = reorder(country, illiteracy),
  fill = country == "Negroes, U.S.A."
  )) +
  geom_col(width=.1)

Below is the code with width 1.

R

ggplot(d_literacy_country, aes( 
  x = illiteracy, 
  y = reorder(country, illiteracy),
  fill = country == "Negroes, U.S.A."
  )) +
  geom_col(width=1)

Notice the differences?

Additionally, the function scale_fill_manual() allows users to adapt the colors of the fill based on values. Remember earlier, the legend printed colors based on TRUE/FALSE of country == “Negroes, U.S.A.”. We will build on that previous knowledge to adapt the code:

R

scale_fill_manual(values = c("TRUE"= "purple", "FALSE" = "orange"))

R

ggplot(d_literacy_country, aes( 
  x = illiteracy, 
  y = reorder(country, illiteracy),
  fill = country == "Ireland"
)) +
  geom_col(width=.5) +
  scale_fill_manual(values = c("TRUE"= "purple", "FALSE" = "orange"))
Challenge

Challenge 4: Bar width and bar color

Edit the code to include an appropriate width size. Below the bar graph uses purple to high Black Americans and the other countries are orange. In the original Plate 47, red is used for Black Americans and darkgreen for the other countries? Update the code.

library(ggplot2)

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

ggplot(d_literacy_country, aes(
  x = illiteracy,
  y = reorder(country, illiteracy),
  fill = country == "Negroes, U.S.A."
)) +
  scale_fill_manual(values = c("TRUE"= "purple", "FALSE" = "orange"))
  geom_col(width=.1)

R

library(ggplot2)

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

ggplot(d_literacy_country, aes( 
  x = illiteracy, 
  y = reorder(country, illiteracy),
  fill = country == "Negroes, U.S.A."
)) +
  scale_fill_manual(values = c("TRUE"= "red", "FALSE" = "darkgreen")) +
  geom_col(width=.5)

##Adding Du Bois Theme

The style of our current figure does not quite match the original Plate 47. Can you identify 1-2 style characteristics missing?…

A few features missing are: dimensions, background color, and title. Next, we will adapt the code to match the style, specifically: dimensions, background color, legend, and other default ggplot elements.

R

options(repr.plot.width=22/3, repr.plot.height=28/3)

source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")

Depending on our intention or circumstance, we adapt the size of a graph. The first line of code tells R that we want the width and height of the graph to have the same ratio that Du Bois used, 22 inches wide by 28 inches tall, with each divided by 3 so that it doesn’t display too big.

The second line tells R to use a specific Du Bois style. We can add theme_dubois() within geom_col(width = #) + theme_dubois(). However, make sure to include source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R") in order to use theme_dubois()

Changing the font to theme(text = element_text('serif'))

R

source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")

ggplot(d_literacy_country, aes( 
  x = illiteracy, 
  y = reorder(country, illiteracy),
  fill = country == "Ireland"
)) +
  geom_col(width=.5) +
  theme_dubois() +
  scale_fill_manual(values = c("TRUE"= "purple", "FALSE" = "orange"))+
  theme(text = element_text('serif'))
Challenge

Challenge 5: Using Du Bois style

Update the code with the options and the theme_dubois().

library(ggplot2)

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

ggplot(d_literacy_country, aes(
  x = illiteracy,
  y = reorder(country, illiteracy),
  fill = country == "Negroes, U.S.A."
)) +
  geom_col(width=.5) +
  scale_fill_manual(values = c("TRUE"= "red", "FALSE" = "darkgreen")) +
  theme(text = element_text('FONT'))

R

library(ggplot2)
options(repr.plot.width=22/3, repr.plot.height=28/3)
source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

ggplot(d_literacy_country, aes( 
  x = illiteracy, 
  y = reorder(country, illiteracy),
  fill = country == "Negroes, U.S.A."
)) +
  geom_col(width=.5) +
  theme_dubois() +
  scale_fill_manual(values = c("TRUE"= "red", "FALSE" = "darkgreen"))+
  theme(text = element_text('serif'))

Titles


To add titles and subtitles to the graph, we use the labs function in ggplot. We use the title and subtitle specifications with labs.

The title text needs to be enclosed in quotation marks. We use the code \n to tell R to put a “new line” break at different places in the title based on Du Bois’ titling.

Fill in the blank with your name in the title code below to show that the graph was recreated by you?

labs(
	title="Graph title"
	subtitle="2026"
)
Challenge

Challenge 6: Adding labels

Based on the Plate 47, update with your name.

library(ggplot2)
options(repr.plot.width=22/3, repr.plot.height=28/3)
source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

ggplot(d_literacy_country, aes(
  x = illiteracy,
  y = reorder(country, illiteracy),
  fill = country == "Negroes, U.S.A."
)) +
  geom_col(width=.5) +
  theme_dubois() +
  scale_fill_manual(values = c("TRUE"= "red", "FALSE" = "darkgreen"))+
  theme(text = element_text('serif'))+
    labs(
        title = "\nIlliteracy of the American Negroes compared with that of other nations.\n",
        subtitle = "Proportion d' illettrés parmi les Nègres Americains comparée à celle des autres nations.\n\n
        Done by Atlanta University.\n\ngit ad
        Recreated by STUDENT NAME HERE\n\n"
    )

R

library(ggplot2)
options(repr.plot.width=22/3, repr.plot.height=28/3)
source("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/theme_dubois.R")

d_literacy_country <- read.csv("https://raw.githubusercontent.com/HigherEdData/Du-Bois-STEM/refs/heads/main/data/d_literacy_country.csv")

ggplot(d_literacy_country, aes( 
  x = illiteracy, 
  y = reorder(country, illiteracy),
  fill = country == "Negroes, U.S.A."
)) +
  geom_col(width=.5) +
  theme_dubois() +
  scale_fill_manual(values = c("TRUE"= "red", "FALSE" = "darkgreen"))+
  theme(text = element_text('serif')) +
  
  labs(
      title = "\nIlliteracy of the American Negroes compared with that of other nations.\n",
      subtitle = "Proportion d' illettrés parmi les Nègres Americains comparée à celle des autres nations.\n\n
      Done by Atlanta University.\n\n
      Recreated by STUDENT NAME HERE\n\n"
    )
Key Points
  • Learn about early Du Bois data visualization
  • Use R to read tabular data
  • Use ggplot2 to create graphs
  • Use modern features to recreate Plate 47