Introduction


  • NLP is a subfield of Artificial Intelligence (AI) that, using the help of Linguistics, deals with approaches to process, understand and generate natural language
  • Linguistic Data has special properties that we should consider when modeling our solutions
  • Key tasks include language modeling, text classification, token classification and text generation
  • Deep learning has significantly advanced NLP, but the challenge remains in processing the discrete and ambiguous nature of language
  • The ultimate goal of NLP is to enable machines to understand and process language as humans do

From words to vectors


  • The first step for working with text is to run a preprocessing pipeline to obtain clear features
  • We can represent text as vectors of numbers (which makes it interpretable for machines)
  • One of the most efficient and useful ways is to use word embeddings
  • We can easily compute how words are similar to each other with the cosine similarity

Transformers: BERT and BeyondTransformersBERTBERT ArchitectureBERT as a Language ModelBERT for Text ClassificationBERT for Token Classification


Episode 3: Using large language modelsReferences