Convolutional neural network ( CNN ) is a type of neural network architecture specially made to deal with visual data. It is very much similar to ordinary ANNs , i.e they are made up of artificial neurons and have learnable parameters. In this article we will discuss the architecture of CNN and implement it on CIFAR-10 dataset.
- Why CNNs?
- Convolution layer
- ReLu Activation Layer
- Pooling layer
- Fully connected layer
- Data Preprocessing
- Designing the CNN
- Applications of CNN
Why CNN ?
The main benefit of using a CNN over simple ANN on visual data is that CNN’s are constrained to deal with image data exclusively. One of the main features of CNN is weight sharing, as a result it reduces the number of weights significantly . Furthermore high quality images consists a very high number of pixels which can lead to a very high number of weights if you use simple ANN , here we can understand the need of CNN and how can they scale to high resolution images also. Another advantage of CNN is that they are very good feature extractors. This means that you can extract useful attributes from an already trained CNN with its trained weights by feeding your data on each level and tune the CNN a bit for the specific task.
Several new layers are introduced in CNNs to extract the useful features from our image or reducing the size of image without loosing the original representation.
Convolutional layer apply convolution operation on the input layer , passing the results to next layer. A convolution operation is basically computing a dot product between their weights and a small region they are connected(currently overlapping) to in the input volume. This will change the dimensions depending on the filter size used and number of filters used.
Rectifying Linear Unit (ReLU) layer applies the relu activation element-wise . It is a mathematical function, which returns a positive value or 0 in place of previous negative values :
It does not change the dimensions of the previous layer.
Pooling layer will perform a down-sampling operation along the width and resulting in the reduction of the dimensions. The sole purpose of pooling is to reduce spatial dimensions. There are various types of pooling in which the most common is Max Pooling, i.e taking the maximum element from the window.
Stride decides by how much we move our window ,when we have a stride of one we move across and down a single pixel. With higher stride values, we move large number of pixels at a time and hence produce smaller output volumes.
Padding is used to preserve the boundary information , since without padding they are only traversed once.
This layer will convert the 3-dimensions (height,width,depth) into a single long vector to feed it to the fully connected layer or Dense layer. It connects every neuron in one layer to every neuron in another layer.
Fully Connected Layer and Output Layer
Fully connected layers or dense layers are the same hidden layers consisting of defined number of neurons connected with elements of another layer that we discussed in simple ANN. However the output layer is also the same but the number of neurons depend on our task . For instance in CIFAR-10 dataset we have 10 classes hence we will have 10 neurons in the outer layer.
In the end to summarize the architecture of CNN , we can simply understand that it consist of an input layer followed by a Conv layer. The dimensions of conv layer depends on the data and problem, hence changing the dimensions accordingly. After the Conv Layer there is a activation layer , usually ReLU since it gives better results. After some conv and relu combination , pooling layer is used to reduce the size. Then after some combination of previously defined architecture , flattening layer is used to flatten the input for fully connected layer. Next to these layer, the last layer is the output layer.
Let’s implement a convolution neural network on a very famous dataset know as Cifar-10 .
The CIFAR-10 dataset consists of 60000 32×32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class.
The first step of any machine learning task is to preprocess the data . The following code contains all the required code to preprocess the data :
Designing the CNN
Now let’s design our model to classify 10 different classes from the given dataset. First we have to define the architecture of the model.
Now we have to train our model, before training we need to define learning rate, optimizer, metrics and batch size.
To evaluate the model, we can print the accuracy , since we defined this metrics for our model. We can also print the confusion matrix to understand where our model is lacking and how can we improve that.
We get an accuracy around 79% for the test set which is good for our first CNN which we developed from scratch!
Applications of CNN
CNNs are now used in almost every task in computer vision domain since it outperforms the older techniques if there is a significant amount of data available. Some applications are listed below :
- Object Detection ( Powers the Self driving cars)
- Face recognition
- Neural Art transfer
- X-ray diagnosis
- Satellite image analysis
- In astronomy
If you have any thoughts or suggestions , you can ask in the comments or you can directly contact me through the contact forum. Please support us by sharing this article and giving your feedback.
Also if you want me to write about any specific topic related to AI, you can tell in the comments section and I’ll look into that for my upcoming articles.