Ever since the creation of first neural networks artificial intelligence has come a long way. Despite this fact it takes a lot of effort, time and background knowledge to create a new neural network structure from ground up. To circumvent this problem one of the methods we can apply is transfer learning. With transfer learning we can perform the same task at the fraction of the time and effort compared to forming the model from ground-up with similar or most of the time better performance.
There are roughly two different transfer learning methods. The first method is based on extracting features from a pre-trained network and applying a classification algorithm such as xg-boost or decision tree.
The second method uses fine-tuning. In this method the classification layer of the original network is replaced and optionally some of the convolutional layers will be re-trained. This method is more commonly used in the image processing applications.
In this post, first I will briefly explain what fine-tune transfer learning is. Then I will explain its advantages in terms of data and training efficiency. Then I will briefly explain working principle of the method. Then I will provide two examples regarding the method. Finally I will provide a conclusion.
Understanding Fine Tune Transfer Learning
Fine Tune Transfer Learning is the process of using pre-trained models, which already have trained in one domain, to boost the performance of a related task in a different but similar domain. In essence, transfer learning leverages existing knowledge to learn new things at a faster and more efficient rate. In this regard transfer learning is like knowing how to drive a car and learning how to drive a motorcycle using already learned knowledge.
Thanks to this inherent nature, transfer learning allows you to build your model on the foundation of models that have already been trained on vast datasets. This is especially useful, when training data-set has insufficient number of samples compared to the complexity of the task we aim to perform or training a new model requires too much computational power or time or both. Furthermore, it is more likely to get a more successful model compared to ones build from ground-up as pre-trained models have already learned wide range of features.
Data Efficiency
Fine tune transfer learning is usually applied to pre-trained networks such as Resnet, Vgg-16, Alex-Net. Compared to custom neural networks build from scratch Those networks overall performances are significantly higher. There are two reasons as to why those networks perform well.
The first reason is that those networks are very deep and complex. For instance, Res-net50 contains 50 cnn layers in addition to containing many average pool layers and shortcuts to improve its performance.
The second reason is that those neural networks are usually trained on very big data-sets such as Image-Net which contains 14 million images and 1000 classes.
As a result of both reasons such pre-trained neural networks have potential to achieve good performance level on similar tasks that are outside of their own trained domain. One of the key advantages of fine tune transfer learning is to achieve this potential even if number of samples in the data set, we aim to use for transfer learning is the fraction of what the original network is trained on.
For this reason when sample data is sufficient or when the number of samples in each class is not homogenously divided fine-tune transfer learning achieves better result than shallower neural networks build from ground-up
Training Efficiency
For the cases when training sample size is big enough it still might be wiser to consider fine tune transfer learning for computer vision tasks. Reason is that it is possible to achieve similar or better result with significantly lower computational power and training time using pre-trained networks.
As mentioned earlier pre-trained networks are very complex neural structures trained on very big image datasets. The inherent result of both features is each layer in those models captures and learns to recognize different representations inside the images. Those representations include but are not limited to recurring patterns, shapes, and even high-level concepts that are more correlated to the task specific needs. Thus cnn layers that are close to input layers extract features from more generalized patterns from images such as edges and shapes; cnn layers close to output looks for more task related features.
For instance, if you train a network to differentiate dogs from cats foremost cnn-layers usually looks for animal shapes while rearmost cnn-layers searches for features specific to the animals such as whiskers or pointy ears etc.
With fine-tune transfer learning it is possible to increase performance by re-training those rearmost layers that are closer to output layer in addition to replacing classification layer. This you would have to train only a couple of cnn layers as well as MLP part training requirement and time is quite small compared to creating a custom neural network even for the worst-case scenario.
Working Principle of Transfer Learning
Fine-tune transfer learning is performed with two different methods.
In the first method the weight of cnn layers in the original pre-trained network is kept intact. Only the classification (MLP) layer is replaced to match with the problem’s requirements.
In the second method some cnn layers in there-trained network are re-trained alongside replaced MLP (classification) layer starting from the rear-most convolution layer as those layers tend to learn more task specific features. A more detailed explanation as to why is given in the Training Efficiency section.
Since second method involves training some of the cnn layers; it is likely to achieve better results. However it is also more resource intensive and require more training time and more fine-tuning. That is why first method is usually the preferred transfer learning method. Second method is only applied when fine tuning MLP layer does not yield the desired success rate.
An Example of Transfer Learning
To further illustrate transfer learning let’s perform image classification on Cifar-10 data set using Resnet-50 by using two fine-tuning methods.
To create our custom models with the first method. Let us replace the original classification (MLP) layers with 4 perceptron layers containing 32 16 8 and 10 neurons. Here the output layer that contains 10 neurons is equivalent to number class labels. Block Schematic of the used model is given as Figure-1. Code for the first example is given as Example-1.
import cv2 as cv
import random
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten
from tensorflow.keras import Sequential
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import f1_score
#Importing Resnet50 model from tensorflow
from keras.applications.resnet import ResNet50
from keras.applications.resnet import preprocess_input
# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
#Preprocessed with built-in function instead of normalization
x_train=keras.applications.resnet.preprocess_input(x_train)
x_test=keras.applications.resnet.preprocess_input(x_test)
#Splitting dataset for validation
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.15, shuffle=True)
#Calling the model without its original input layer and replacing it with the resolution of CIFAR10 (32x32x3)
resnet50_model=ResNet50(include_top=False, weights='imagenet', input_shape=(32, 32, 3))
#Freezing the model parameters for transfer learning except for the last 7 layers
for layer in resnet50_model.layers:
layer.trainable = False
#Creating our own model with resnet50 and adding classification layers
model=Sequential()
model.add(resnet50_model)
model.add(Flatten())
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=16, activation='relu'))
model.add(Dense(units=8, activation='relu'))
model.add(Dense(units=10, activation='softmax'))
#Checking the model.
model.summary()
#Compiling and Training Model
model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=32, epochs=20, validation_data=(x_val, y_val))
#Generate y_pred vector
def generate_ypred (x_test,y_test):
x_shape=x_test[0].shape
y_pred=[]
for counter in range(0,len(x_test)):
x=x_test[counter]
x=x.reshape(1,x_shape[0],x_shape[1],x_shape[2])
y=np.argmax(model.predict(x,verbose=0))
y_pred.append(y)
return y_pred
#Evaluate the model
y_pred=generate_ypred(x_test=x_test,y_test=y_test)
f1_macro=f1_score(y_test, y_pred, average='macro')
print('Macro f1 score is:',f1_macro)
With the first example we can achieve an f-1 score of 0.619.
Now let us try the second method with another example. For this case let’s train last couple of layers in addition to replacing the classification layer. With this way we can achieve an f-1 score of 0.79. The code for the second example is given as Example-2.
import cv2 as cv
import random
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten
from tensorflow.keras import Sequential
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import f1_score
#Importing Resnet50 model from tensorflow
from keras.applications.resnet import ResNet50
from keras.applications.resnet import preprocess_input
# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
#Preprocessed with built-in function instead of normalization
x_train=keras.applications.resnet.preprocess_input(x_train)
x_test=keras.applications.resnet.preprocess_input(x_test)
#Splitting dataset for validation
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.15, shuffle=True)
#Calling the model without its original input layer and replacing it with the resolution of CIFAR10 (32x32x3)
resnet50_model=ResNet50(include_top=False, weights='imagenet', input_shape=(32, 32, 3))
#Freezing the model parameters for transfer learning except for the last 7 layers
for layer in resnet50_model.layers:
layer.trainable = False
#Setting last 42 layers as trainable.
for layer in resnet50_model.layers[:-42]:
layer.trainable=True
for i, layer in enumerate(resnet50_model.layers):
print(i,layer.name, "-",layer.trainable)
#Creating our own model with resnet50 and adding classification layers
model=Sequential()
model.add(resnet50_model)
model.add(Flatten())
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=16, activation='relu'))
model.add(Dense(units=8, activation='relu'))
model.add(Dense(units=10, activation='softmax'))
#Checking the model.
model.summary()
#Compiling and Training Model
model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=32, epochs=20, validation_data=(x_val, y_val))
#Generate y_pred vector
def generate_ypred (x_test,y_test):
x_shape=x_test[0].shape
y_pred=[]
for counter in range(0,len(x_test)):
x=x_test[counter]
x=x.reshape(1,x_shape[0],x_shape[1],x_shape[2])
y=np.argmax(model.predict(x,verbose=0))
y_pred.append(y)
return y_pred
#Evaluate the model
y_pred=generate_ypred(x_test=x_test,y_test=y_test)
f1_macro=f1_score(y_test, y_pred, average='macro')
print('Macro f1 score is:',f1_macro)
Conclusion
To conclude transfer learning is a cornerstone of modern AI based computer vision tasks, allowing us to build sophisticated models with less effort and data. When you work on any image processing related task that requires deep learning based solutions transfer learning can be a valuable ally with a few caveats.
Overfitting to the target domain, domain mismatch, and computational resources are some of the commonly encountered problems. Strategies like regularization, domain adaptation and distributed training can help mitigate these issues.
Overall transfer learning is a powerful tool for AI-based computer vision tasks. To understand transfer learning further you can study the following links:
1) https://medium.com/@kenneth.ca95/a-guide-to-transfer-learning-with-keras-using-resnet50-a81a4a28084b
2) https://www.tensorflow.org/tutorials/images/transfer_learning
3) https://www.youtube.com/watch?v=3ou0KYtDlOI&ab_channel=deeplizard
4) https://www.youtube.com/watch?v=5T-iXNNiwIs&ab_channel=deeplizard