Introduction
- NLP is a subfield of Artificial Intelligence (AI) that, using the help of Linguistics, deals with approaches to process, understand and generate natural language
- Linguistic Data has special properties that we should consider when modeling our solutions
- Key tasks include language modeling, text classification, token classification and text generation
- Deep learning has significantly advanced NLP, but the challenge remains in processing the discrete and ambiguous nature of language
- The ultimate goal of NLP is to enable machines to understand and process language as humans do
From words to vectors
- The first step for working with text is to run a preprocessing pipeline to obtain clear features
- We can represent text as vectors of numbers (which makes it interpretable for machines)
- One of the most efficient and useful ways is to use word embeddings
- We can easily compute how words are similar to each other with the cosine similarity