This lesson is still being designed and assembled (Pre-Alpha version)

Text Analysis in Python

Welcome to the Text Analysis workshop for Python! Below is the list of lessons including a brief summary.


Python experience is required for this workshop.


Setup Download files required for the lesson
00:00 1. Introduction to Natural Language Processing What is Natural Language Processing?
What tasks can be done by Natural Language Processing?
What does a workflow for an NLP project look?
00:35 2. Corpus Development- Text Data Collection How do I evaluate what kind of data to use for my project?
What do I need to consider when building my corpus?
01:15 3. Preparing and Preprocessing Your Data How can I prepare data for NLP?
What are tokenization, casing and lemmatization?
01:35 4. Vector Space and Distance How can we model documents effectively?
How can we measure similarity between documents?
What’s the difference between cosine similarity and distance?
02:15 5. Document Embeddings and TF-IDF What is a document embedding?
What is TF-IDF?
02:45 6. Latent Semantic Analysis What is topic modeling?
What is Latent Semantic Analysis (LSA)?
03:15 7. Intro to Word Embeddings How can we extract vector representations of individual words rather than documents?
What sort of research questions can be answered with word embedding models?
04:00 8. The Word2Vec Algorithm How does the Word2Vec model produce meaningful word embeddings?
How is a Word2Vec model trained?
04:45 9. Training Word2Vec How can we train a Word2Vec model?
When is it beneficial to train a Word2Vec model on a specific dataset?
05:50 10. Finetuning LLMs How can I fine-tune preexisting LLMs for my own research?
How do I pick the right data format?
How do I create my own labels?
How do I put my data into a model for finetuning?
How do I evaluate success at my task?
07:50 11. Ethics and Text Analysis Is text analysis artificial intelligence?
How can training data influence results?
What are the risk zones to consider when using text analysis for research?
08:30 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.