Inner working of an AI that mimics human memory

In the last blog, I gave an overview of LSTMs (long short term memory) in AI that mimics human memory. I will use this blog to go 2 layers below to draw the building blocks of this technology.

LSTM-1As a reminder, at a 50k level, the building block looks like the image on the left. There is long and short term memory on the left –> some input comes in –> new long and short term memory is output on the right. Plus, an output that determines what the input is.

Let’s open the box called the LSTM NN (neural network). This block is composed of four blocks or gates:

  • The Forget Gate
  • The Learn Gate
  • The Remember Gate
  • The Use Gate

The intuitive understanding of the gates is as follows: When some new input comes in, the system determines what from the long term memory should be forgotten to make space for the new stuff coming in; this is done by the forget gate. Then, the learn gate is used to determine what should be learnt and dropped from the short term memory.

The processed output from these gates is fed to the remember gate which then updates the long term memory; in other words a new long term memory is formed based on the updated short term and the long term memory. Finally, the use gate kicks in and produces a new short term memory and an output.LSTM-2

Going a level deeper:

Learn Gate

The learn gate is broken in two phases: Combine –> Ignore.

  • Combine: In the combine step, the system takes in the short term memory and the input and combines them together. In the example, the output will be Squirrels, Trees and Dog/Wolf (we don’t know yet — see previous blog for context)
  • Ignore: In the second phase, information that isn’t pertinent will be dropped. In the example, the information about trees is dropped because the show was about wild animals.

Forget Gate

The forget gate decides what to keep from the long term memory. In the example, the show is about wild animals but there was input about wild flora, so the forget gate decides it is going to drop information about the flora.

Remember Gate

This gate is very simple. It adds output from the Learn gate and Forget gate to form the new long term memory. In the example, the output will be squirrel, dog/wolf and elephants.

Use Gate

This gate combines the input from the learn gate

Math behind the various gates for mathematically inclined

Learn Gate

Combine phase (output = Nt)

Take the STM from time t-1, take the current event Et and pass them through a tanh function.

Nt = tanh (STM_t-1, Et)

Mathematically Nt = tanh (Wn [STM_t-1, Et] + bn) where Wn and bn are weight and bias vectors.

Then, the output from the combine phase is multiplied by another vectory called i_t that is the ignore factor from the Ignore phase.

Ignore phase (output = i_t)

We create a new neural network that takes the input and STM and apply the sigmoid function on them.

i_t = sigmoid ( Wi [STM_t-1, Et] + bi)

Thus, the output from the Learn gate is:

tanh (Wn [STM_t-1, Et] + bn) * sigmoid ( Wi [STM_t-1, Et] + bi)

Forget Gate

The forget output is calculated by multiplying the long term memory with a forget factor (ft).

The forget factor is calculated using the short term memory and the input.

ft = sigmoid( Wf [STM_t-1, Et] + bf]

Forget output = LTM_t-1 * ft

Remember Gate

The remember gate takes output from the Forget and Learn gates and adds them together.

LTMt = LTM_t-1 * ft + Nt * i_t

Use Gate

The use gate applies a tanh function on the output of the forget gate and multiplies it to a sigmoid of the input and the event.

STMt = tanh (Wu [STM_t-1, Et] + bu) * sigmoid ( Wv [STM_t-1, Et] + bv)

Summary

The four gates mimic a point in time memory system. To truly envision this, think of a lattice of connected cells separated in time. Thus, the memory system will continually evolve and learn over a period of time.

(credit: the math and the example are coming in from the Udacity deep learning coursework)

Mimicking human memory with AI

Memory is a fascinating function of the human brain. Specifically, the interplay of the short term memory and long term memory where both work in conjunction to help humans decide how to respond to the current stimuli is what makes us function in the real world..

Let’s take an example —

I am watching a program on TV and suddenly a picture of a dog/wolf comes up. From the looks of it, I cannot distinguish between the two. What is it – a dog or a wolf?

If the previous image was a squirrel and since squirrels are likely to be in a domestic setting, I could make an assessment that the current image is a dog.

This would be reasonably true, if all I had was my short term memory.

At this point, my long term memory kicks in at this point and tells me that I am watching a show about wild animals. Voila, the obvious answer at that point is that the current image is of a wolf.

LSTMs mimic human memory

A specific branch under Deep Learning in AI called LSTMs (long short term memory) are used to solve problems that have temporal (or time based) dependencies. In other words, LSTMs are used to mimic the human memory to predict outcomes. Unlike another branch in Deep Learning called RNNs (recurrent NNs) which only keep the short term memory around, LSTMs bring in long term memory to bring in high fidelity to their predictions.

The Working of LSTMs

LSTM-1
The NN takes in the long term memory (Elephant), the short term memory (Squirrels, Trees) and the input (Dog/Wolf) and makes the following set of determination:

  • What should it forget? Trees in this case because the show is about wild animals and not trees.
  • What should it learn? There is a Dog/Wolf in addition to the squirrels and trees.
  • What should it predict? The Wolf
  • What should it remember long term? Elephant, Squirrel, Wolf
  • What should it remember short term? Squirrel, Wolf

All the above is done for the current time and the new long/short term memory are fed into the next input that comes at time t+1.

Thus, you can think of the above picture as recurring for every time epoch t.

In the next blog, we will deep dive into the LSTM NN and see how each of the bulleted questions are answered.

Pretty interesting, isn’t it?

(disclaimer: the example used is from the Deep Learning course work on Udacity)

How do they detect faces in pictures?

 

06 Mar 2018

face-detection

I was fascinated when Facebook launched the feature where it put a box around a human head (and a bit creeped out when it started suggesting the name of the human next to the box). I always wondered how they did it and filed it under machine-learning magic-ery. Now, I know how they do it so let me peel back the curtain.

There are two distinct problem domains in the feature

  • Find the human – we will use this blog lifting the curtain behind the magic.
  • Label the human – this is supervised machine learning and we will ignore this problem in the blog.

The “Find the human” problem is solved through something called “Haar Cascade Classifiers” – detailed article for brilliant humans and the rest follow along in this blog :-).

The underlying building block is a Classifier but lets drop the terminology and use airport security as a metaphor to explain the process.

Think of face detection solution as a airport security problem where a series of security guards each do a specialised task. The guard at the airport entrance is responsible to ensure that there is no suspicious car loitering around the airport. The guard at the security gate is responsible for letting ones with a valid id and a boarding ticket. The person behind the scanner is responsible to weed out any harmful objects in the handbag. The person behind the scanning machine ensures that no person gets in with a gun. The explosives security person uses a specialised explosive detector paper and puts in the machine to find out if hidden explosives are carried by the person under consideration.

Each of this security guard is a Classifier and classifies a particular threat. When each is put together in a series, we get a Cascade of Classifiers. Each building on the work of the other. Each and everyone of them has to perform their specialised task for a successful outcome. The successful outcome in this case is that a person was allowed into the airport lounge and can board his/her plane. Each of the classifier goes through a great deal of specialised training for it to perform their task. Makes sense?

So lets apply this metaphor to face detection machine-learning algorithm. In ML, each classifier focusses on a special feature within a picture. The basic classifier tells something as simple as “this is a horizontal edge” or “this is a vertical edge” where edge detection is a feature. This classifier feeds into another one that perhaps says “this is a square” and so on so forth. Eventually, you get to a classifier that tells you “this is a bridge of the nose” or “these are eyes”. Each classifier has been fed 100s of thousands of images that are either positive (human in the picture) or negative (no human in the picture) for it to learn to correctly classify the pictures.

So how many such features are there? Turns out a whole lot. For a typical 24×24 pixel, there are 160k+ features. The Haar in the “Haar based classifier” is a mathematical function that optimises this algorithm and reduces the number of features to look out for to about 6k.

Now it turns out that applying this knowledge into our programs is a lot simpler than the entire training process because opencv.org provides a python package called opencv to detect these pictures.

I ran a short function to detect humans in about 100 pictures with humans and ended up with a 100% detection rate – not bad at all. Running it over 100 dog pictures ended up returning a 89% accuracy rate. Thus, 11% of dogs were categorised as humans and if you know me – I think that is fair because some dogs are like humans :-).

My github repo here if you want to see the code, although you can find it on the Haar based Classifier link as well.

React Native Component Styling

30 Sep 2017

Unlike web apps React Native doesn’t have universal styling that is specified via a css. React solves it by asking developers to marry the styling within the component file. As a developer, I found this very convenient to work with – I knew exactly what styling I had to bring into my component. That said, I would expect this to be a challenge working with a designer requiring a constant back and forth. Here is an example of a component that I put together.

Pre-Styling

React-Native-Component-Style-1//embedr.flickr.com/assets/client-code.js

Post-Styling

React-Native-Component-Style-2//embedr.flickr.com/assets/client-code.js

Code

import React from 'react';
import { View } from 'react-native';

const Card = (props) => {
    return (
        <View style={styles.containerStyle}>
            {props.children}
        </View>
    );
};

const styles = {
    containerStyle: {
        borderWidth: 1,
        borderRadius: 2,
        borderColor: '#ddd',
        borderBottomWidth: 0,
        shadowColor: '#000',
        shadowOffset: { width: 0, height: 2 },
        shadowOpacity: 0.1,
        shadowRadius: 2,
        elevation: 1,
        marginLeft: 5,
        marginRight: 5,
        marginTop: 10

    }
};

export default Card;