Summary and Schedule

Welcome

This lesson is about the fundamentals of Natural Language Processing (NLP) in Python, with applications in the Humanities and Social Sciences.

It will equip you with the foundational skills and knowledge needed to carry over text-based research projects. The lesson is designed specifically with researchers in the Humanities and Social Sciences in mind, but is also applicable to other fields of research.

Prerequisites

Before joining this course, participants should have:

Basic Python programming skills
Basic knowledge of Git and GitHub

We can help you out with teaching this lesson

Do you want to teach this lesson? Find more help in the README Feel free to reach out to us with any questions that you have. Just open a new issue. We also value any feedback on the lesson!

Setup Instructions

Download files required for the lesson

00h 00m

1. Introduction

What is NLP?
What are real-world applications of NLP?
Which problems NLP solves best?
What is language from a NLP perspective?
How does NLP relates to Deep Learning and Machine Learning?
::::::

01h 00m

2. Episode 1: From text to vectors

Why do we need to prepare a text for training?
How do I prepare a text to be used as input to a model?
What different types of pre processing steps are there?
How do I train a neural network to extract word embeddings?
What properties word embeddings have?
What is a word2vec model?
How do we train a word2vec model?
How do I get insights regarding my text, based on the word embeddings?

04h 50m

3. Episode 2: BERT and Transformers

What are Transformers?
What is BERT and how does it work?
How can I use BERT as a text classifier?
How should I evaluate my classifiers?

05h 00m

Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

Software Setup

Installing Python

Python is a popular language for scientific computing, and a frequent choice for machine learning as well. To install Python, follow the Beginner’s Guide or head straight to the download page.

Please set up your python environment at least a day in advance of the workshop. If you encounter problems with the installation procedure, ask your workshop organizers via e-mail for assistance so you are ready to go as soon as the workshop begins.

Installing the required packages

Pip is the package management system built into Python. Pip should be available in your system once you installed Python successfully.

Jupyter Lab

We will teach using Python in Jupyter Lab, a programming environment that runs in a web browser. Jupyter Lab is compatible with Firefox, Chrome, Safari and Chromium-based browsers. Note that Internet Explorer and Edge are not supported. See the Jupyter Lab documentation for an up-to-date list of supported browsers.

To start Jupyter Lab, open a terminal (Mac/Linux) or Command Prompt (Windows) and type the command:

jupyter lab

Data Sets

For the episode 01: preprocessing and word embeddings (Word2Vec):

Download the Algemeen Dagblad from July 21 1969 as txt file from Delpher. To do so, click on the link and navigate to the right hand side of the web page. There you’ll find an icon with an arrow pointing down:

Click on this icon and select txt among the downloading options

Download Word2Vec models trained on 6 national Dutch newspaper data spanning a time period from 1950 to 1989 (Wevers, M., 2019). These models are available on Zenodo.