Unraveling the Magic of Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision and image analysis. These specialized neural networks have proven to be a game-changer in various applications, from facial recognition and object detection to medical image analysis and autonomous vehicles. In this blog, we will delve into the fascinating world of CNNs, exploring how they work, their key components, and their diverse applications.

At its core, a Convolutional Neural Network is a type of deep learning model inspired by the human visual system. It is designed to process visual data, such as images and videos, and it is particularly efficient at recognizing patterns, shapes, and features in these data types.

Figure 1. Architecture of a Convolutional Neural Network

The Jupyter Notebook for this blog can be found here.

Applications of Convolutional Neural Networks
The Architecture of the Visual Cortex
Convolutional Layers

Filters
Stacking Multiple Feature Maps

Pooling Layers
CNN Architectures

VGGNet

Pretrained Models for Transfer Learning
Conclusion

1. Applications of Convolutional Neural Networks

CNNs have found applications in various domains, showcasing their adaptability and effectiveness. Some of the most notable applications include:

Image Classification - CNNs excel at classifying objects in images, which is the basis for numerous applications such as content filtering, quality control, and more.
Object Detection - They are widely used in identifying and locating objects within images and videos, a crucial component in self-driving cars, surveillance systems, and augmented reality applications.
Facial Recognition - CNNs are employed in facial recognition systems, which have become integral in security, authentication, and social media tagging.
Medical Image Analysis - CNNs have shown incredible potential in diagnosing diseases from medical images like X-rays, MRIs, and CT scans. They assist radiologists in making accurate diagnoses.
Natural Language Processing - CNNs are not limited to images. They are also used in text classification and sentiment analysis tasks, where they can process textual data as images for feature extraction
Autonomous Vehicles - CNNs play a vital role in the perception systems of self-driving cars, helping them understand the road environment and make real-time decisions.

2. The Architecture of the Visual Cortex

The architecture of the visual cortex, located at the rear of the brain, is a marvel of biological engineering. It comprises a complex network of neurons organized into distinct areas, each with specialized functions. One of its key features is the hierarchical arrangement of processing, where lower-level areas detect basic features like edges and motion. In contrast, higher-level areas recognize more complex objects and scenes. The hierarchy allows for the gradual abstraction of visual information, ultimately forming our perception of the world. The visual cortex is divided into two hemispheres, each with distinct functions.

Figure 2. Biological neurons in the visual cortex respond to specific patterns in small regions of the visual field called the receptive fields; as the visual signal makes its way through consecutive brain modules, neurons respond to more complex patterns in larger receptive fields

The studies of the visual cortex inspired the neocognitron, introduced in 1980, which gradually evolved into what we now call convolutional neural networks.

3. Convolutional Layers

The most important building block of a CNN is the convolutional layer: neurons in the first convolutional layer are not connected to every single pixel in the input image, but only to pixels in their receptive fields (Figure 3). In turn, each neuron in the second convolutional layer is connected only to neurons located within a small rectangle in the first layer. This architecture allows the network to concentrate on small low-level features in the first hidden layer, then assemble them into larger higher-level features in the next hidden layer, and so on. This hierarchical structure is common in real-world images, which is one of the reasons why CNNs work so well for image recognition.

Figure 3. CNN layers with rectangular local receptive fields

Filters

Filters play a fundamental role in signal processing, image analysis, and data manipulation. These mathematical operations are used to extract or enhance specific features within data. In image processing and computer vision, filters are often used to detect edges, textures, and other patterns by convolving them over an image. Filters can be linear or non-linear and are crucial components in tasks like image enhancement, noise reduction, and feature extraction.

A layer full of neurons using the same filter outputs a feature map, which highlights the areas in an image that activate the filter the most. Of course, you do not have to define the filters manually: instead, during training the convolutional layer will automatically learn the most useful filters for its task, and the layers above will learn to combine them into more complex patterns.

Figure 4. Applying two different filters to get two feature maps

Figure 5. Feature maps obtained by applying different filters in the convolutional layer

Stacking Multiple Feature Maps

A convolutional layer has multiple filters (you decide how many) and outputs one feature map per filter, so it is more accurately represented in 3D (see Figure 6). It has one neuron per pixel in each feature map, and all neurons within a given feature map share the same parameters. A convolutional layer simultaneously applies multiple trainable filters to its inputs, making it capable of detecting multiple features anywhere in its inputs.

Figure 6. Convolutional layers with multiple feature maps, and images with three color channels

4. Pooling Layers

Once you understand how convolutional layers work, the pooling layers are quite easy to grasp. The goal is to subsample (i.e., shrink) the input image in order to reduce the computational load, memory usage, and the number of parameters (thereby limiting the risk of overfitting)

Just like in convolutional layers, each neuron in a pooling layer is connected to the outputs of a limited number of neurons in the previous layer, located within a small rectangular receptive field. You must define its size, the stride, and the padding type, just like before. However, a pooling neuron has no weights; all it does is aggregate the inputs using an aggregation function such as the max or mean.

Figure 7 shows a max pooling layer, which is the most common type of pooling layer. Only the max input value in each receptive field makes it to the next layer, while the other inputs are dropped. For example, in the lower-left receptive field in Figure 7, the input values are 1, 5, 3, and 2, so only the max value 5, is propagated to the next layer. Because of the stride of 2, the output image has half the height and half the width of the input image

Figure 7. Max pooling layer (2 x 2 pooling kernel, stride 2, no padding)

However, max pooling has some downsides too. It is obviously very destructive: even with a tiny 2 x 2 kernel and a stride of 2, the output will be two times smaller in both directions (so its area will be four times smaller).

5. CNN Architectures

Typical CNN architectures stack a few convolutional layers (each one generally followed by a ReLU layer), then a pooling layer, then another few convolutional layers (+ReLU), then another pooling layer, and so on. The image gets smaller and smaller as it progresses through the network. Still, it also typically gets deeper and deeper (i.e., with more feature maps), thanks to the convolutional layers (see Figure 8). At the top of the stack, a regular feedforward neural network is added, composed of a few fully connected layers (+ReLU), and the final layer outputs the prediction (e.g., a softmax layer that outputs estimated class probabilities).

Figure 8. A typical CNN architecture

A common mistake is to use convolutional kernels that are too large. For example, instead of using a convolutional layer with a 5 x 5 kernel, stack two layers with 3 x 3 kernels: it will use fewer parameters require fewer computations, and will usually perform better.

VGGNet

The VGG network was developed by Karen Simonyan and Andrew Zisserman from the Visual Geometry Group (VGG) research lab at Oxford University. VGG is a deep convolutional neural network architecture known for its simplicity and effectiveness. It is characterized by its deep stack of layers, primarily composed of 3x3 convolutional filters with small receptive fields. This architecture, which includes configurations like VGG16 and VGG19, has achieved remarkable performance in image classification and object recognition tasks. The network's deep and uniform structure makes it easy to understand and train, contributing to its popularity as a reference model and a foundational component in developing more advanced convolutional neural networks.

Figure 9. The architecture of the VGG16 net

6. Pretrained Models for Transfer Learning

If you want to build an image classifier but you do not have enough training data, then it is often a good idea to reuse the lower layers of a pre-trained model. This is known as Transfer Learning.

In Transfer Learning, a model typically trained on a large dataset for a specific task is fine-tuned for a new, often smaller dataset or problem. This approach saves computational resources, reduces the need for extensive labeled data, and accelerates the development of effective models. It has found wide applications in various fields, including natural language processing, computer vision, and speech recognition, enabling powerful and efficient machine-learning solutions in real-world scenarios.

7. Conclusion

Convolutional Neural Networks are a remarkable class of deep learning models that have significantly impacted the field of computer vision and extended their influence to other domains. Their ability to automatically learn and extract features from visual data has led to remarkable advancements in image analysis, object recognition, and various other applications. As technology continues to advance, CNNs are likely to play an increasingly crucial role in shaping our digital future.

Stay tuned for more interesting topics on machine learning!

Machine Learning - Its Impact and Our Future

Search This Blog