Introduction
NLP is a subfield of Artificial Intelligence (AI) that, using the help of Linguistics, deals with approaches to process, understand and generate natural language
Linguistic Data has special properties that we should consider when modeling our solutions
Key tasks include language modeling, text classification, token classification and text generation
Deep learning has significantly advanced NLP, but the challenge remains in processing the discrete and ambiguous nature of language
The ultimate goal of NLP is to enable machines to understand and process language as humans do
From words to vectors
- We can represent text as vectors of numbers (which makes it interpretable for machines)
- We can run a preprocessing pipeline to obtain clear words that can be used as features
- We can easily compute how words are similar to each other with the cosine similarity
- Using gensim we can train our own word2vec models