An Image Processing Method: Transfer Learning with Fine Tuning 

Ever since the creation of first neural networks artificial intelligence has come a long way.  Despite this fact it takes a lot of effort, time and background knowledge to create a new neural network structure from ground up. To circumvent this problem one of the methods we can apply is transfer learning. With transfer learning we can perform the same task at the fraction of the time and effort compared to forming the model from ground-up with similar or most of the time better performance. 

There are roughly two different transfer learning methods. The first method is based on extracting features from a pre-trained network and applying a classification algorithm such as xg-boost or decision tree. 

The second method uses fine-tuning. In this method the classification layer of the original network is replaced and optionally some of the convolutional layers will be re-trained. This method is more commonly used in the image processing applications. 

In this post, first I will briefly explain what fine-tune transfer learning is. Then I will explain its advantages in terms of data and training efficiency. Then I will briefly explain working principle of the method. Then I will provide two examples regarding the method. Finally I will provide a conclusion. 

Understanding Fine Tune Transfer Learning 

Fine Tune Transfer Learning is the process of using pre-trained models, which already have trained in one domain, to boost the performance of a related task in a different but similar domain. In essence, transfer learning leverages existing knowledge to learn new things at a faster and more efficient rate. In this regard transfer learning is like knowing how to drive a car and learning how to drive a motorcycle using already learned knowledge.  

Thanks to this inherent nature, transfer learning allows you to build your model on the foundation of models that have already been trained on vast datasets. This is especially useful, when training data-set has insufficient number of samples compared to the complexity of the task we aim to perform or training a new model requires too much computational power or time or both. Furthermore, it is more likely to get a more successful model compared to ones build from ground-up as pre-trained models have already learned wide range of features. 

Data Efficiency 

Fine tune transfer learning is usually applied to pre-trained networks such as Resnet, Vgg-16, Alex-Net. Compared to custom neural networks build from scratch Those networks overall performances are significantly higher.  There are two reasons as to why those networks perform well. 

 The first reason is that those networks are very deep and complex. For instance, Res-net50 contains 50 cnn layers in addition to containing many average pool layers and shortcuts to improve its performance.  

The second reason is that those neural networks are usually trained on very big data-sets such as Image-Net which contains 14 million images and 1000 classes.  

As a result of both reasons such pre-trained neural networks have potential to achieve good performance level on similar tasks that are outside of their own trained domain.  One of the key advantages of fine tune transfer learning is to achieve this potential even if number of samples in the data set, we aim to use for transfer learning is the fraction of what the original network is trained on. 

For this reason when sample data is sufficient or when the number of samples in each class is not homogenously divided fine-tune transfer learning achieves better result than shallower neural networks build from ground-up 

Training Efficiency 

For the cases when training sample size is big enough it still might be wiser to consider fine tune transfer learning for computer vision tasks. Reason is that it is possible to achieve similar or better result with significantly lower computational power and training time using pre-trained networks.  

As mentioned earlier pre-trained networks are very complex neural structures trained on very big image datasets. The inherent result of both features is each layer in those models captures and learns to recognize different representations inside the images. Those representations include but are not limited to recurring patterns, shapes, and even high-level concepts that are more correlated to the task specific needs. Thus cnn layers that are close to input layers extract features from more generalized patterns from images such as edges and shapes; cnn layers close to output looks for more task related features. 

For instance, if you train a network to differentiate dogs from cats foremost cnn-layers usually looks for animal shapes while rearmost cnn-layers searches for features specific to the animals such as whiskers or pointy ears etc. 

With fine-tune transfer learning it is possible to increase performance by re-training those rearmost layers that are closer to output layer in addition to replacing classification layer. This you would have to train only a couple of cnn layers as well as MLP part training requirement and time is quite small compared to creating a custom neural network even for the worst-case scenario. 

Working Principle of Transfer Learning 

Fine-tune transfer learning is performed with two different methods. 

In the first method the weight of cnn layers in the original pre-trained network is kept intact. Only the classification (MLP) layer is replaced to match with the problem’s requirements.  

In the second method some cnn layers in there-trained network are re-trained alongside replaced MLP (classification) layer starting from the rear-most convolution layer as those layers tend to learn more task specific features. A more detailed explanation as to why is given in the Training Efficiency section. 

Since second method involves training some of the cnn layers; it is  likely to achieve better results. However it is also more resource intensive and require more training time and more fine-tuning. That is why first method is usually the preferred transfer learning method. Second method is only applied when fine tuning MLP layer does not yield the desired success rate. 

An Example of Transfer Learning 

To further illustrate transfer learning let’s  perform image classification on Cifar-10 data set using Resnet-50 by using two fine-tuning methods.  

To create our custom models with the first method. Let us replace the original classification (MLP) layers with 4 perceptron layers containing 32 16 8 and 10 neurons. Here the output layer that contains 10 neurons is equivalent to number class labels. Block Schematic of the used model is given as Figure-1. Code for the first example is given as Example-1. 

Figure 1: Block Schematic of the used Neural Network
import cv2 as cv 

import random 

import numpy as np 

import tensorflow as tf 

from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten 

from tensorflow.keras import Sequential 

from tensorflow import keras 

from tensorflow.keras import layers 

import matplotlib.pyplot as plt 

from sklearn.model_selection import train_test_split 

from sklearn.decomposition import PCA 

from sklearn.metrics import confusion_matrix 

from sklearn.metrics import ConfusionMatrixDisplay 

from sklearn.metrics import f1_score 

 

#Importing Resnet50 model from tensorflow 

from keras.applications.resnet import ResNet50 

from keras.applications.resnet import preprocess_input 

 

# Load CIFAR-10 dataset 

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data() 

 

#Preprocessed with built-in function instead of normalization 

x_train=keras.applications.resnet.preprocess_input(x_train) 

x_test=keras.applications.resnet.preprocess_input(x_test) 

 

#Splitting dataset for validation 

x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.15, shuffle=True) 

 

#Calling the model without its original input layer and replacing it with the resolution of CIFAR10 (32x32x3) 

resnet50_model=ResNet50(include_top=False, weights='imagenet', input_shape=(32, 32, 3)) 

#Freezing the model parameters for transfer learning except for the last 7 layers 

for layer in resnet50_model.layers: 

    layer.trainable = False 

 

#Creating our own model with resnet50 and adding classification layers 

model=Sequential() 

model.add(resnet50_model) 

model.add(Flatten()) 

model.add(Dense(units=32, activation='relu')) 

model.add(Dense(units=16, activation='relu')) 

model.add(Dense(units=8, activation='relu')) 

model.add(Dense(units=10, activation='softmax')) 

 

#Checking the model. 

model.summary() 

 

#Compiling and Training Model 

model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=["accuracy"]) 

model.fit(x_train, y_train, batch_size=32, epochs=20, validation_data=(x_val, y_val)) 

 

#Generate y_pred vector 

def generate_ypred (x_test,y_test): 

  x_shape=x_test[0].shape 

  y_pred=[] 

  for counter in range(0,len(x_test)): 

    x=x_test[counter] 

    x=x.reshape(1,x_shape[0],x_shape[1],x_shape[2]) 

    y=np.argmax(model.predict(x,verbose=0)) 

    y_pred.append(y) 

 

  return y_pred 

 

#Evaluate the model 

y_pred=generate_ypred(x_test=x_test,y_test=y_test) 

 
 

f1_macro=f1_score(y_test, y_pred, average='macro') 

print('Macro f1 score is:',f1_macro) 

With the first example we can achieve an f-1 score of  0.619.  

Now let us try the second method with another example. For this case let’s train last couple of layers in addition to replacing the classification layer. With this way we can achieve an f-1 score of  0.79. The code for the second example is given as Example-2. 

import cv2 as cv 

import random 

import numpy as np 

import tensorflow as tf 

from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten 

from tensorflow.keras import Sequential 

from tensorflow import keras 

from tensorflow.keras import layers 

import matplotlib.pyplot as plt 

from sklearn.model_selection import train_test_split 

from sklearn.decomposition import PCA 

from sklearn.metrics import confusion_matrix 

from sklearn.metrics import ConfusionMatrixDisplay 

from sklearn.metrics import f1_score 

 

#Importing Resnet50 model from tensorflow 

from keras.applications.resnet import ResNet50 

from keras.applications.resnet import preprocess_input 

 

# Load CIFAR-10 dataset 

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data() 

 

#Preprocessed with built-in function instead of normalization 

x_train=keras.applications.resnet.preprocess_input(x_train) 

x_test=keras.applications.resnet.preprocess_input(x_test) 

 

#Splitting dataset for validation 

x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.15, shuffle=True) 

 

#Calling the model without its original input layer and replacing it with the resolution of CIFAR10 (32x32x3) 

resnet50_model=ResNet50(include_top=False, weights='imagenet', input_shape=(32, 32, 3)) 

#Freezing the model parameters for transfer learning except for the last 7 layers 

for layer in resnet50_model.layers: 

    layer.trainable = False 

 

#Setting last 42 layers as trainable. 

for layer in resnet50_model.layers[:-42]: 

  layer.trainable=True 

 
 

for i, layer in enumerate(resnet50_model.layers): 

  print(i,layer.name, "-",layer.trainable) 

 

#Creating our own model with resnet50 and adding classification layers 

model=Sequential() 

model.add(resnet50_model) 

model.add(Flatten()) 

model.add(Dense(units=32, activation='relu')) 

model.add(Dense(units=16, activation='relu')) 

model.add(Dense(units=8, activation='relu')) 

model.add(Dense(units=10, activation='softmax')) 

 
 

#Checking the model. 

model.summary() 

 

#Compiling and Training Model 

model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=["accuracy"]) 

model.fit(x_train, y_train, batch_size=32, epochs=20, validation_data=(x_val, y_val)) 

 

#Generate y_pred vector 

def generate_ypred (x_test,y_test): 

  x_shape=x_test[0].shape 

  y_pred=[] 

  for counter in range(0,len(x_test)): 

    x=x_test[counter] 

    x=x.reshape(1,x_shape[0],x_shape[1],x_shape[2]) 

    y=np.argmax(model.predict(x,verbose=0)) 

    y_pred.append(y) 

 

  return y_pred 

 

#Evaluate the model 

y_pred=generate_ypred(x_test=x_test,y_test=y_test) 

 
 

f1_macro=f1_score(y_test, y_pred, average='macro') 

print('Macro f1 score is:',f1_macro) 

Conclusion 

To conclude transfer learning is a cornerstone of modern AI based computer vision tasks, allowing us to build sophisticated models with less effort and data. When you work on any image processing related task that requires deep learning based solutions transfer learning can be a valuable ally with a few caveats. 

Overfitting to the target domain, domain mismatch, and computational resources are some of the commonly encountered problems. Strategies like regularization, domain adaptation and distributed training can help mitigate these issues. 

Overall transfer learning is a powerful tool for AI-based computer vision tasks. To understand transfer learning further you can study the following links: 

1) https://medium.com/@kenneth.ca95/a-guide-to-transfer-learning-with-keras-using-resnet50-a81a4a28084b 

2) https://www.tensorflow.org/tutorials/images/transfer_learning 

3) https://www.youtube.com/watch?v=3ou0KYtDlOI&ab_channel=deeplizard 

4) https://www.youtube.com/watch?v=5T-iXNNiwIs&ab_channel=deeplizard