Machine Learning Algorithms as a Mapping Between Spaces: From SVMs to Manifold Learning | by Salih Salih | Feb, 2024

Editor
5 Min Read


Convolutional Autoencoders: Encoding Complexity into Simplicity

The code below shows an example of a convolutional autoencoder, which is a type of autoencoders that works well with images. We will use the popular MNIST dataset[LeCun, Y., Cortes, C., & Burges, C.J. (1998). The MNIST Database of Handwritten Digits. Retrieved from TensorFlow, CC BY 4.0], which contains 28×28 pixel grayscale images of handwritten digits. The encoder plays a crucial role by reducing the dimensionality of the data from 784 elements to a smaller, more condensed form. The decoder then aims to reconstruct the original high-dimensional data from this lower-dimensional representation. However, this reconstruction is not perfect and some information is lost. The autoencoder overcomes this challenge by learning to prioritize the most important features of the data.

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt

# Load MNIST dataset
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), 28, 28, 1))
x_test = x_test.reshape((len(x_test), 28, 28, 1))

# Define the convolutional autoencoder architecture
input_img = layers.Input(shape=(28, 28, 1))

# Encoder
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

# Decoder
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

# Autoencoder model
autoencoder = tf.keras.Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(x_train, x_train, epochs=10, batch_size=64, validation_data=(x_test, x_test))

# Visualization
# Sample images
sample_images = x_test[:8]
# Reconstruct images
reconstructed_images = autoencoder.predict(sample_images)

# Plot original images and reconstructed images
fig, axes = plt.subplots(nrows=2, ncols=8, figsize=(14, 4))
for i in range(8):
axes[0, i].imshow(sample_images[i].squeeze(), cmap='gray')
axes[0, i].set_title("Original")
axes[0, i].axis('off')
axes[1, i].imshow(reconstructed_images[i].squeeze(), cmap='gray')
axes[1, i].set_title("Reconstructed")
axes[1, i].axis('off')
plt.show()

Image by Author

The output above shows how well the autoencoder works. It displays pairs of images: the original digit images and their reconstructions after encoding and decoding. This example proves that the encoder captures the essence of the data in a smaller form and the decoder can approximate the original image, even though some information is lost during compression.

Now, let’s go further and visualize the learned latent space (the bottleneck). We will use PCA and t-SNE, two techniques to reduce dimensions, to show the compressed data points on a 2D plane. This step is important because it helps us see how the autoencoder organizes the data in the latent space and shows any natural clusters of similar digits. We used PCA and t-SNE together just to compare how well they work.

# Encode all the test data
encoded_imgs = encoder.predict(x_test)

# Reduce dimensionality using PCA
pca = PCA(n_components=2)
pca_result = pca.fit_transform(encoded_imgs)

# Reduce dimensionality using t-SNE
tsne = TSNE(n_components=2, perplexity=30, n_iter=300)
tsne_result = tsne.fit_transform(encoded_imgs)

# Visualization using PCA
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.scatter(pca_result[:, 0], pca_result[:, 1], c=y_test, cmap=plt.cm.get_cmap("jet", 10))
plt.colorbar(ticks=range(10))
plt.title('PCA Visualization of Latent Space')

# Visualization using t-SNE
plt.subplot(1, 2, 2)
plt.scatter(tsne_result[:, 0], tsne_result[:, 1], c=y_test, cmap=plt.cm.get_cmap("jet", 10))
plt.colorbar(ticks=range(10))
plt.title('t-SNE Visualization of Latent Space')

plt.show()

Image by Author

Comparing the two resulted graphs, t-SNE is better than PCA at separating different classes of digits in the latent space visualization(it captures non-linearity). It creates distinct clusters with minimal overlap between classes. The autoencoder compresses images into a lower dimensional space but still captures enough information to distinguish between different digits, as shown in the t-SNE graph.

An important note here is that t-SNE is a non-linear technique used for visualizing high-dimensional data. It preserves local data structures, making it useful for identifying clusters and patterns visually. However, it is not typically used for feature reduction in machine learning.

But what does this autoencoder probably learn?

Generally speaking, one can say that an autoencoder like this learns the basic and simple edges and textures, moving to parts of the digits like loops and lines and how they are arranged, and finally understanding whole digits(hierarchical characteristics), all this while capturing the unique essence of each digit in a compact form. It can guess missing parts of an image and recognizes common patterns in how digits are written.

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.