Welcome to the Text Analysis workshop for Python! Below is the list of lessons including a brief summary.
Prerequisites
Python experience is required for this workshop.
Welcome to the Text Analysis workshop for Python! Below is the list of lessons including a brief summary.
Prerequisites
Python experience is required for this workshop.
Setup | Download files required for the lesson | |
00:00 | 1. Introduction to Natural Language Processing |
What is Natural Language Processing?
What tasks can be done by Natural Language Processing? What does a workflow for an NLP project look? |
00:35 | 2. Corpus Development- Text Data Collection |
How do I evaluate what kind of data to use for my project?
What do I need to consider when building my corpus? |
01:15 | 3. Preparing and Preprocessing Your Data |
How can I prepare data for NLP?
What are tokenization, casing and lemmatization? |
01:35 | 4. Vector Space and Distance |
How can we model documents effectively?
How can we measure similarity between documents? What’s the difference between cosine similarity and distance? |
02:15 | 5. Document Embeddings and TF-IDF |
What is a document embedding?
What is TF-IDF? |
02:45 | 6. Latent Semantic Analysis |
What is topic modeling?
What is Latent Semantic Analysis (LSA)? |
03:15 | 7. Intro to Word Embeddings |
How can we extract vector representations of individual words rather than documents?
What sort of research questions can be answered with word embedding models? |
04:00 | 8. The Word2Vec Algorithm |
How does the Word2Vec model produce meaningful word embeddings?
How is a Word2Vec model trained? |
04:45 | 9. Training Word2Vec |
How can we train a Word2Vec model?
When is it beneficial to train a Word2Vec model on a specific dataset? |
05:50 | 10. Finetuning LLMs |
How can I fine-tune preexisting LLMs for my own research?
How do I pick the right data format? How do I create my own labels? How do I put my data into a model for finetuning? How do I evaluate success at my task? |
07:50 | 11. Ethics and Text Analysis |
Is text analysis artificial intelligence?
How can training data influence results? What are the risk zones to consider when using text analysis for research? |
08:30 | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.