Skin Cancer Detection and Lesion Classification using Multi-Convolutional Neural Networks

The problem statement points out a requirement for an Image Processing and Deep Learning solution to help Doctors and surgeons to identify the associated allergies from captured images of skin, predict the probability of skin cancer, and suggest remedies to prevent further damage.

My team at University presents a solution to the above problem in the form of two independent Convolutional Neural Network (CNN) models, for Classification of Types of Skin Lesions and associated allergies, and for the Prediction of Probability of Skin cancer and its type, respectively. A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics.

The Model Architecture used for Predicting probability of Skin Cancer was ResNet50(Residual Network with 50 Layers) + 3 Convolutional Layers. ResNet50 eliminates the Vanishing Gradient problem.

As the gradient is backpropagated to earlier layers, repeated multiplication may make the gradient extremely small. As a result, as the network goes deeper, its performance gets saturated or even starts degrading rapidly. The idea is if that if we take a ‘shallow’ network and just stack on more layers to create a deeper network, the performance of the deeper network should be at least as good as the shallow network as the deeper network could learn the exact shallow network by setting the new stacked layers to identity layers (in reality we know this is probably highly unlikely to happen using no architectural priors or current optimization methods). It was observed that this was not the case and that training error sometimes got worse when more layers were stacked on top of a shallower model. As a solution, ResNet50 introduces the concept of skip connections which are used in deep residual layers to allow the network to learn deviations from the identity layer, hence the term residual, residual here referring to difference from the identity.

In general, in a deep convolutional neural network, several layers are stacked and are trained to the task at hand. The network learns several low/mid/high level features at the end of its layers. In residual learning, instead of trying to learn some features, we try to learn some residual. Residual can be simply understood as subtraction of feature learned from input of that layer. ResNet does this using shortcut connections or skip connections (directly connecting input of nth layer to some (n+x)th layer). It has proved that training this form of networks resolves the problem of degrading accuracy.

The model was trained using Skin Cancer ISIC Dataset, which is a large collection of 3297 images labelled as Benign or Malignant cancer.

Skin Cancer ISIC Dataset

ResNet50 clearly gives a very high accuracy when tested against images. The model we trained for detecting Benign/Malignant Cancer gave us a Training accuracy of 92%, and Test accuracy of 85%. Hence, the accuracy itself proves that this architecture of CNN is best suited for this classification. The accuracy is speculated to be further improved to 95%+ with access to greater computing power and better GPUs, more training images, and more time to train the model by increasing the number of epochs.

The model architecture used is DenseNet169 (Densely Connected Convolutional Network) + 6 Convolutional layers.

CNN architecture is used for this prediction model as well. We found that while ResNet50 gave excellent results with predicting if the cancer is Benign or Malignant(that is, when the model had 2 classes to predict from), it didn’t work so well when it had 7 classes to classify from(i.e. the above listed 7 types of Skin Lesions). Other neural network architectures underperformed as well, as the number of classes had increased. After experimenting with more types of Residual Network architectures(like ResNet101, ResNet152), other convolutional neural network architectures like VGG16, VGG19, MobileNet, InceptionV3, and DenseNet121, we found that the following model gave the best accuracy: DenseNet169 + 6 convolutional layers.

The problems arise with CNNs when they go deeper. This is because the path for information from the input layer until the output layer (and for the gradient in the opposite direction) becomes so big, that they can get vanished before reaching the other side (that is, the vanishing gradient problem, again).

DenseNets simplify the connectivity pattern between layers. This architecture solves the problem ensuring maximum information (and gradient) flow. To do it, it simply connects every layer directly with each other. Instead of drawing representational power from extremely deep or wide architectures, DenseNets exploit the potential of the network through feature reuse. Furthermore, some variations of ResNets have proven that many layers are barely contributing and can be dropped. In fact, the number of parameters of ResNets are big because every layer has its weights to learn. Instead, DenseNets layers are very narrow (e.g. 12 filters), and they just add a small set of new feature-maps.

The model was trained using HAM10000 Dataset, which is A large collection of multi-source dermatoscopic images of common pigmented skin lesions, containing 10000 images with 7 class types.

HAM10000 Dataset

After very rigorous testing with a number of models, and adjusting a number of hyperparameters of each of those models, using varying number of epochs, testing with different loss functions, learning rates, adding a varying number of convolutional layers after the model, using DenseNet169 as the base model, we found the best accuracy. The modified architecture gave a Train accuracy of 84% and a Testing accuracy of 71%, which is speculated to increase to ~92% with access to greater computational power, better GPUs, bigger training datasets, and more training time.

The Backend for the System has been developed using Flask. The Flask Server pre-processes the received image data, procures a prediction using the saved models and labels the predictions into user-understandable terminology. The output is parsed into JSON data and sent to the Request source.

If we ran the models every time an image required to be processed and classified, it would render the System extremely inefficient, cause latency of an unacceptable magnitude, and increase the susceptibility of Server to crashes, resulting in an overall unpleasant User experience. Instead of running the models every time, the trained models with the tuned weights were saved into an H5 file, allowing them to be harnessed at any time without having to re-train. This makes the whole process much faster, and makes the System efficient.

The Front end of the System was developed using React JS and Bootstrap. The user can navigate through it’s friendly UI easily and upload a picture of the affected area of the patient’s Skin anomaly. The React app sends the image data to the backend with an HTTP POST request and receives the JSON data containing the prediction as response. Since the bulk of the System (Flask server & Models) does not run on the User’s machine, the App is fast, lightweight and fluid to use.

Undergraduate CS Student | Web Developer | Competitive Programmer | AI Enthusiast