AutoEncoder – In data science, we often encounter multidimensional data relationships. Understanding and representing these is often not straightforward. But how do you effectively reduce the dimension without reducing the information content?

Unsupervised dimension reduction

One possibility is offered by unsupervised machine learning algorithms, which aim to code high-dimensional data as effectively as possible in a low-dimensional way.
If you don’t know the difference between unsupervised, supervised and reinforcement learning, check out this article we wrote on the topic.

What is an AutoEncoder?

The AutoEncoder is an artificial neural network that is used to unsupervised reduce the data dimensions.
The network usually consists of three or more layers. The gradient calculation is usually done with a backpropagation algorithm. The network thus corresponds to a feedforward network that is fully interconnected layer by layer.

Types

AutoEncoder types are many. The following table lists the most common variations.

The figure shows all common AutoEncoder types
AutoEncoder types

However, the basic structure of all variations is the same for all types.

Basic Structure

Each AutoEncoder is characterized by an encoding and a decoding side, which are connected by a bottleneck, a much smaller hidden layer.

The following figure shows the basic network structure.

The figure shows the basic AutoEncoder structure.
AutoEncoder model architecture


During encoding, the dimension of the input information is reduced. The average value of the information is passed on and the information is compressed in such a way.
In the decoding part, the compressed information is to be used to reconstruct the original data. For this purpose, the weights are then adjusted via backpropagation.
In the output layer, each neuron then has the same meaning as the corresponding neuron in the input layer.

Autoencoder vs Restricted Boltzmann Machine (RBM)

Restricted Boltzmann Machines are also based on a similar idea. These are undirected graphical models useful for dimensionality reduction, classification, regression, collaborative filtering, and feature learning. However, these take a stochastic approach. Thus, stochastic units with a particular distribution are used instead of the deterministic distribution.


RBMs are designed to find the connections between visible and hidden random variables. How does the training work?
The hidden biases generate the activations during forward traversal and the visible layer biases generate learning of the reconstruction during backward traversal.

Pretraining

Since the random initialization of weights in neural networks at the beginning of training is not always optimal, it makes sense to pre-train. The task of training is to minimize an error or a reconstruction in order to find the most efficient compact representation for input data.

The figure shows the pretraining procedure of an autoencoder according to Hinton.
Training Stacked Autoencoder


The method was developed by Geoffrey Hinton and is primarily for training complex autoencoders. Here, the neighboring layers are treated as a Restricted Boltzmann Machine. Thus, a good approximation is achieved and fine-tuning is done with a backpropagation.