What an A.I thinks about cryptocurrency

In todays article I had a A.I write another article but this time instead of writing about politics like my last article. I will have the A.I write about...

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




A guide to transfer learning with Keras using ResNet50

In this blog post we will provide a guide through for transfer learning with the main aspects to take into account in the process, some tips and an example implementation in Keras using ResNet50 as the trained model. The task is to transfer the learning of a ResNet50 trained with Imagenet to a model that identify images from CIFAR-10 dataset. Several methods were tested to achieve a greater accuracy which we provide to show the variety of options for a training. However with the final model of this blog we get an accuracy of 94% on test set.

Learning something new takes time and practice but we find it easy to do similar tasks. This is thanks to human association involved in learning. We have the capability to identify patterns from previous knowledge an apply it into new learning.

When we meet a person than is faster or better than us in something like a video game or coding it is almost certain that he has do it before or there is an association with a previous similar activity.

If we know how to ride a bike, we don’t need to learn from zero how to ride a motorbike. If we know how to play football, we don’t need to learn from zero how to play futsal. If we know how to play the piano, we don’t need to learn from zero how to play another instrument.

The same is applicable to machines, if we train a model with a database, it’s not necessary to retrain from zero all the model to adjust to a new similar dataset. Both Imagenet and CIFAR-10 have images that can train a model to classify images. Then, it is very promising if we can save time training a model (because it can really take long time) and start using the weights of a previously trained model. We are going through this concept of transfer learning with all what you need to also build a model on your own.

Setting our environment

The first thing we do is importing the libraries needed with the line of code below. Running the version as 1.x is optional, without that first line it will run the last version of tensorflow for Colab. We also use numpy and a function of tensorflow but depending on how you build your own model is not necessary to import them.

Database

CIFAR-10 is a dataset with 60000 32x32 colour images grouped in 10 classes, that means 6000 images per class. This is a dataset of 50,000 32x32 color training images and 10,000 test images, labeled over 10 categories.

The categories are airplane, automobile, beer, cat, deer, dog, frog, horse, ship, truck. We can take advantage of the fact that these categories and a lot more are into the Imagenet collection.

To load a database with Keras, we use:

Preprocess

Preprocess data using predefined function from Keras
Using weights of a trained ResNet50

From this point it all comes to testing and a bit of creativity. The starting point is very advantageous since we have weights that already serve for image classification but since we are using it on a completely new dataset, there is a need for adjustments. Our objective is to build a model that has high accuracy in their classifications. In this case, if an image of a dog is presented, it successfully identifies it as a dog and not as a train, for example.

Let’s say we want to achieve an accuracy of more than 88% on training data but we also wish that it doesn’t have overfitting. How do we get this? Well at this point our models may diverge, this is where we test what tools we can use for that objective. The important here is to learn about transfer learning and making robust models. We follow an example but we can run with different approaches that we will discuss.

The two aproaches you can take in transfer learning are:

This refers on how you use the layers of your pretrained model. We have already a very huge amount of parameters because of the number of layer of the ResNet50 but we have calibrated weights. We can choose to ‘freeze’ those layers (as many as you can) so those values doesn’t change, and by that way saving time and computational cost. However as the dataset is entirely different is not a bad idea to train all the model

In this case, we ‘freeze’ all layers except for the last block of the ResNet50. The way to do this in Keras is with:

Set some layers as non-trainable

We can check that we did it correctly with:

Print the layers to check which are trainable

The output is something like this (the are more layer that we omit). False means that the layer is ‘freezed’ or is not trainable and True that when we run our model, the weights are going to be adjusted.

Later, we need to connect our pretrained model with the new layers of our model. We can use global pooling or a flatten layer to connect the dimensions of the previous layers with the new layers. With just a flatten layer and a dense layer with softmax we can perform close the model and start making classification.

We have regularizers to help us avoid overfitting and optimizers to get a faster result. Each of them can also affect our accuracy, so we present what to take into account. The most important are:

We obtained an accuracy of 94% on training set and 90% on validation with 10 epochs. In the 8th epoch, the values are very similar and it is interesting to note that in the first validation accuracy is higher than training. This is because of dropout use, which in Keras, it has a different behavior for training and testing. In testing time, all the features are ready and the dropout is turned off, resulting in a better accuracy. This readjust on the last epochs since the model continues changing on the training.

The summary of the model is below. We found that batch normalization and dropout greatly reduces overfitting and it helps get better accuracy on validation set. The method of ‘freezing layers’ allows a faster computation but hits the accuracy so it was necessary to add dense layers at the end. The shape of the layers holds part of the structure of the original ResNet50 like it was a continuation of it but with the features we mentioned.

For ResNet50 what helped more to achieve a high accuracy was to resize the input from 32 x 32 to 224 x 224. This is because of how the model was constructed which in this sense was not compatible with the dataset but it was easy to solve by fitting it to the original size of the architecture. There was the option of using UpSampling to do this task but we find that the use of Keras layers lambda was way faster.

Training and validation accuracy. We can see the behavior of the dropout technique that adjusts the more epochs.
Training and validation loss. There is a point in which more training doesn’t change that much the results.

We confirmed that ResNet50 works best with input images of 224 x 224. As CIFAR-10 have 32 x 32 images, it was necessary to perform a resize. With this adjustment alone, the model can achieve a high accuracy, I think it was the most important for ResNet50.

A good recommendation when building a model using transfer learning is to first test optimizers to get a low bias and good results in training set, then look for regularizers if you see overfitting over the validation set.

The discussion over using freezing on the pretrained model continues. It reduces computation time, reduces overffiting but lowers accuracy. When the new dataset is very different from the datased used for training it may be necessary to use more layer for adjustment.

On the selecting of hyperparameters, it is important for transfer learning to use a low learning rate to take advantage of the weights of the pretrained model. This choice as the optimizer choice (SGD, Adam, RMSprop) will impact the number of epochs needed to get a successfully trained model.

Add a comment

Related posts:

What Are Lawful Structures for Fintech in Vietnam?

The foundation for the leap forward in all spheres of life has been established by the 4.0 industrial revolution and the explosion of the Internet. Science and technology have a direct impact on the…

Splendid Spoon vs Hungryroot

The food industry is saturated with different kinds of food delivery services. The most common that you would find on the internet are plant-based meal providers. Following this general trend, the…

5 CryptoMonday Highlights After Five Months As A DAO

The first CryptoMondays NYC was on Jan. 8, 2018. It was THE peak market cap day for crypto for two years. We capped the RSVPs at 400: My last CryptoMondays NYC as Host was on Nov. 22, 2021, which…