Introduction
NLP is a subfield of Artificial Intelligence (AI) that, using the help of Linguistics, deals with approaches to process, understand and generate natural language
Linguistic Data has special properties that we should consider when modeling our solutions
Key tasks include language modeling, text classification, token classification and text generation
Deep learning has significantly advanced NLP, but the challenge remains in processing the discrete and ambiguous nature of language
The ultimate goal of NLP is to enable machines to understand and process language as humans do
From words to vectors
- The first step for working with text is to run a preprocessing pipeline to obtain clear features
- We can represent text as vectors of numbers (which makes it interpretable for machines)
- One of the most efficient and useful ways is to use word embeddings
- We can easily compute how words are similar to each other with the cosine similarity