Is that horse running? bringing memory to AI with RNNs.


The human brain does two functions very well – the first is that it can recognise images very well. The second is that it can detect patterns based on time and make predictions based on those patterns.

The last few months I played around with the first function- — around building image recognition neural networks (CNN)– the ones that help you determine whether a picture is of a dog or human (source code) or another example is detecting faces in a picture (article) or recognising handwritten numbers (article).

For example, we can write a CNN that determines that the above picture is of a horse.

But, if you ask a CNN to tell us if this horse is running or not — it would have a hard time. If you ask a human —  the answer is dependent on a number of things and most on time. If you saw this picture as a series of pictures where the horse was loitering around, you could make the determination that it is likely that the horse is standing around. In that case, we used human memory to solve the problem of determining what the horse is doing.

Recently, I started to dig into the second function of the human brain and see how you can mimic the human brain. To answer a question — how do you introduce an element of time in AI.

The canonical example is that I would like to predict the prices of a stock. Another example is Google Assistant, it needs to understand the context to service a request.

Recurrent Neural Networks (RNNs) and Long Short Term Memory (LSTMs) help you solve problems that have dependencies on time. I have blogged (here and here) about LSTMs previously so lets take RNNs in this blog.

The following picture is called an unfolded model of a RNN.

Recurrent Neural Network

Let’s break this picture down.

Each circle is a feedforward NN (ffnn) which means it is taking input (i), does some calculation and puts out a value (o). The weights calculated by the ffnn are Wi and Wo on the input and output respectively.  NN’s typically take anywhere between 1 to thousands of inputs, thus, i can be in thousands. NNs typically output 1..n outputs as well, thus, o can have that range as well.

So far so good.

Let’s now bring in the notion of time.

RNN’s keep an internal state around (s) and Ws are the weights that are generated for that state. Think of s as the memory component of the RNN.

The way you bring in the memory element along is that you feed the memory from time t as the input to time t+1 and so on. This, is how RNNs are different than standard feed forward networks. The input increases from i to i+s.


s_t = some_function (Wi*i + s_t-1*Ws)

The picture is that of a simple NN repeating itself over time starting with t upto t+2 but really going on to infinity.

The next level of complexity is to stack an RNN on top of another and create a 2 layer RNN. Two layers is the beginning because you can stack an arbitrary number of them.


This lattice of RNNs then provide the flexibility to process temporal dependent data and make complex predictions.

Leave a Reply