Introduction


  • NLP is a subfield of Artificial Intelligence (AI) that, using the help of Linguistics, deals with approaches to process, understand and generate natural language

  • Linguistic Data has special properties that we should consider when modeling our solutions

  • Key tasks include language modeling, text classification, token classification and text generation

  • Deep learning has significantly advanced NLP, but the challenge remains in processing the discrete and ambiguous nature of language

  • The ultimate goal of NLP is to enable machines to understand and process language as humans do

From words to vectors


  • We can represent text as vectors of numbers (which makes it interpretable for machines)
  • We can run a preprocessing pipeline to obtain clear words that can be used as features
  • We can easily compute how words are similar to each other with the cosine similarity
  • Using gensim we can train our own word2vec models

Transformers: BERT and BeyondTransformersBERTBERT ArchitectureBERT as a Language ModelBERT for Text ClassificationBERT for Token Classification


Episode 3: Using large language modelsReferences