Transfer Learning or Self-Supervised Learning. What would you choose? |

Promoted

Transfer Learning or Self-Supervised Learning. What would you choose?

January 7, 2021

Shortcoming of having a small training dataset is overfitting the model, resulting in degrading performance on the test data. Popular technique is to use the pretrained model which has been trained on a larger dataset. Pretrained model has a good representation network. We initialize the parameters of our model using a pretrained network and then fine tune it by training on our smaller dataset. Both transfer learning and self-supervised learning uses pre-trained models trained on large source dataset and then fine tuned on target dataset. So which one is preferred?


What is the difference?	0	heading
In transfer-learning(TL), a model is pre-trained on a large dataset to perform some predictive task from source domain and then fined to perform another task from target domain. Eg. In the source domain we pre-train the model for object recognition of vehicles and then fine tune in the target domain to recognize the brand of cars.	1	paragraph
Semi-supervised learning(SSL) does the same thing. But the difference between both is how the models are pre-trained. In transfer-learning, model is pre-trained through supervised learning in which dataset from source is annotated by humans while in self-supervised learning, model is pre-trained unsupervised without labelled data. Unsupervised learning is carried out by performing some auxiliary tasks set by humans. Best example is the BERT NLP model which performs auxiliary tasks like predict the present word from past words or predict present word from future words or predict word from both present and past words in a given sentence. Data is not labelled rather BERT generates labels. Another example is autoencoders which reconstruct the source image.	2	paragraph
Fig 1: Workflow of transfer learning and self-supervised learning. (Yang et).	3	image
Yang et performed a comprehensive study to show under what settings either of techniques outperforms. They choose a dataset from different domains like common objects(Imagenet), scenery, birds/insects, x-ray/ct-scan. Source domain is where model was pre-trained and target domain is fine-tuning of model	4	paragraph
When source domain and target domain is different	5	heading
When source domain(Nature) and target domain(Pneumonia) difference was large, SSL performed better. TL performed well on nature dataset as source and flowers dataset as target. This is because a lot of labels from source were overlapped in the target which TL utilized. But SSL was robust when domains were different. SSL therefore may be used in real world applications where domains differ by larger margins.	6	paragraph
Promoted Promoted When amount of pre-training is small	7	heading
When the amount of pre-training data was small SSL outperformed TL on all target tasks. This is because when the dataset is small TL likely suffers overfitting. SSL is less sensitive to data amount than TL. Large dataset enables TL to learn discriminative representations. Hence as the dataset grows TL learning improves and achieves better results than SSL.	8	paragraph
When classes in source dataset are imbalance	9	heading
When classes are imbalanced both SSL and TL suffer because bias is high due to frequent images from one class which worsens generalization to target tasks. TL is pre-trained using class labels and therefore is more sensitive to the distribution of labels, including imbalance. In contrast, the pretraining of SSL is label-free, hence is less affected by label distribution.	10	paragraph
When using source or target or combined data	11	heading
When pre-trained on combined data from source and target domain, SSL performs better because combined data helps learn better representations. TL on the combined data performs better than on target-only. The reason is that performing TL on the combined data effectively leverages the source task to help with the learning of the target task via multi-task learning. TL on the combined data performs worse than on source-only. The reason is: TL on source-only first pretrains on source data, then finetunes on target data.	12	paragraph
Both methods have potential use in different settings. Transfer Learning is well researched and much popular with computer vision. Semi-Supervised Learning is leading in Natural Language Processing applications due to the introduction of transformers.	13	paragraph
Promoted Promoted References	14	heading
Xingyi Yang∗, Xuehai He∗,Yuxiao Liang, Yue Yang "Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms"	15	references

Transfer Learning or Self-Supervised Learning. What would you choose?

What is the difference?

When source domain and target domain is different

When amount of pre-training is small

When classes in source dataset are imbalance

When using source or target or combined data

References