Bahlai’s Law says,
"Other people's data is always inconsistent and in the wrong format."
We open this lesson by taking data from three different data sources, one scraped by learners in the library carpentry webscraping lesson and joining them together to answer a research question.
We will then take such data, a bibliography embedded in a spreadsheet, and turn it into something usable.
Along the way, we will use all of the tools introduced so far to extract, reformat, and analyze information that would otherwise be difficult or impossible to work with.
Prerequisites
Learners should have completed introductory lessons on:
- the Unix shell (head and tail, word count, sorting, and pipes)
- Git (setting up a repository, committing files)
- Python (libraries, loops, list indexing, string formatting)
- SQL (creating tables, inserting data, joins)