Word Embedding – Word2Vec
The key goal of NLP is to understand the language. The text or speech is our primary medium of communication, but computer systems understand only ‘1’ and ‘0’ – binary numbers, so we need a process which can transform text or word into numeric vectors this is called Word Embedding. Here we talk about “Word2Vec”, introduced in 2013 by team of researchers led by Tomas Mikolov at Google. It is a prediction-based model rather than frequency. It uses predictive analysis to make a weighted guess of a word co-occurring with respect to its neighboring words.
Word2Vec helps us in below ways:
- Similar Words – Finding the nearest words to your query.
- Sentiment Analysis – Few dimensions can indicate whether the sentiment is good or bad.
- Machine Translation & Questions and Answer – similar words will be treated the same.
- Categorization – Words from the same field (politics, sports …etc) will be clustered in same area.
There are two popular models of Word2Vec, both involve Neural Network:
- CBOW (Continuous Bag of Words): To predict a word on bases of it’s neighbors. Works well on semantic but training is slow
- SkipGram: To predict the neighbors of a word. Works well on syntactic and training is faster.
Continuous bag-of-words (CBOW)
This model predicts the current word given context words within specific window. The input layer contains the context words and the output layer contains the current word. The hidden layer contains the number of dimensions in which we want to represent current word present at the output layer. Uses both the n words before and after the target word wt to predict it as shown in figure below.
In the skip-gram model, instead of using the surrounding words to predict the center word, it uses the center word to predict the surrounding words
The skip-gram objective thus sums the log probabilities of the surrounding nwords to the left and to the right of the target word wt to produce the following objective:
Consider an array of words W, if W(t) is the input (center word), then W(t-2), W(t-1), W(t+1), and W(t+2) are the context words, if the sliding window size is 2.
Now we know the idea of word embedding is words that occur in similar context tend to be closer to each other in vector space. To generate word vectors in Python, nltk and gensim libraries are used.