Promoted
Promoted

Transfer Learning or Self-Supervised Learning. What would you choose?

Shortcoming of having a small training dataset is overfitting the model, resulting in degrading performance on the test data. Popular technique is to use the pretrained model which has been trained on a larger dataset. Pretrained model has a good representation network. We initialize the parameters of our model using a pretrained network and then fine tune it by training on our smaller dataset. Both transfer learning and self-supervised learning uses pre-trained models trained on large source dataset and then fine tuned on target dataset. So which one is preferred?

What is the difference?

0heading

In transfer-learning(TL), a model is pre-trained on a large dataset to perform some predictive task from source domain and then fined to perform another task from target domain. Eg. In the source domain we pre-train the model for object recognition of vehicles and then fine tune in the target domain to recognize the brand of cars.

1paragraph

Semi-supervised learning(SSL) does the same thing. But the difference between both is how the models are pre-trained. In transfer-learning, model is pre-trained through supervised learning in which dataset from source is annotated by humans while in self-supervised learning, model is pre-trained unsupervised without labelled data. Unsupervised learning is carried out by performing some auxiliary tasks set by humans. Best example is the BERT NLP model which performs auxiliary tasks like predict the present word from past words or predict present word from future words or predict word from both present and past words in a given sentence. Data is not labelled rather BERT generates labels. Another example is autoencoders which reconstruct the source image.

2paragraph
Fig 1: Workflow of transfer learning and self-supervised learning. (Yang et).
3image

Yang et performed a comprehensive study to show under what settings either of techniques outperforms. They choose a dataset from different domains like common objects(Imagenet), scenery, birds/insects, x-ray/ct-scan. Source domain is where model was pre-trained and target domain is fine-tuning of model

4paragraph

When source domain and target domain is different

5heading

When source domain(Nature) and target domain(Pneumonia) difference was large, SSL performed better. TL performed well on nature dataset as source and flowers dataset as target. This is because a lot of labels from source were overlapped in the target which TL utilized. But SSL was robust when domains were different. SSL therefore may be used in real world applications where domains differ by larger margins.

6paragraph

Promoted
Promoted

When amount of pre-training is small

7heading

When the amount of pre-training data was small SSL outperformed TL on all target tasks. This is because when the dataset is small TL likely suffers overfitting. SSL is less sensitive to data amount than TL. Large dataset enables TL to learn discriminative representations. Hence as the dataset grows TL learning improves and achieves better results than SSL.

8paragraph

When classes in source dataset are imbalance

9heading

When classes are imbalanced both SSL and TL suffer because bias is high due to frequent images from one class which worsens generalization to target tasks. TL is pre-trained using class labels and therefore is more sensitive to the distribution of labels, including imbalance. In contrast, the pretraining of SSL is label-free, hence is less affected by label distribution.

10paragraph

When using source or target or combined data

11heading

When pre-trained on combined data from source and target domain, SSL performs better because combined data helps learn better representations. TL on the combined data performs better than on target-only. The reason is that performing TL on the combined data effectively leverages the source task to help with the learning of the target task via multi-task learning. TL on the combined data performs worse than on source-only. The reason is: TL on source-only first pretrains on source data, then finetunes on target data.

12paragraph

Both methods have potential use in different settings. Transfer Learning is well researched and much popular with computer vision. Semi-Supervised Learning is leading in Natural Language Processing applications due to the introduction of transformers.

13paragraph

Promoted
Promoted

References

14heading
15references