Let’s start with a couple of problems
Problem a) I read a lot of news articles and find that a number of words appear close to each other — New next to York, New next to Delhi; However, if I am reading book about history of India — it is unlikely that New will come next to Delhi; in this case Delhi will stand alone. In other words, I am smart enough to understand the context and the relationship between these words. Question is — can my machine learning program make this relationship based on the data that is presented with.
Problem b) My grammar teacher has done a good job of instilling rules in my head — female of a donkey is a jenny (did you know that?). I can determine Jenny is a donkey while reading a text about donkeys but can my machine learning algorithm make that association?
The larger class of problem is how do you understand some class of text — perhaps all novels by Isaac Asimov or the entire wikipedia database and make predictions based on that dataset.
The machine learning algorithm that makes these smart associations is called “Word2Vec”. Word2Vec is the model used to create word embeddings. The model takes each word and maps it to another word in the word space that the model is learning from; eventually a cluster of related words settle down close to each other.
The larger class of problems is called “Word Embeddings”. These are two dimensional neural networks that are used when there a huge number of classes. The NN is able to make semantic relationship between words and produce richer relationships.
Word2Vec works with a large data set. Let’s take the example of feeding the entire data set of scientific articles on donkeys to the algorithm. Assume, that this a vocabulary of 10k words. Each of these words, will be represented as input (one-hot encoded) to the Word2Vec algorithm. The training data set will produce a set of weights that determine this relationship. The output is a probability distribution of each of the 10k words. Thus, fed in “jack” (male of a donkey), the output distribution will likely be heavily weighted towards “jenny”.
Chris Mccormick has a great overview of the Skip-gram model for Word2Vec.