Part 2 of the 5-minute paper's series.

Note of thanks: I’d like to thank Aakash Nain for providing valuable suggestions with the draft. If you’d like to know more about him you can read his complete interview here that I had the honor of doing.

Edit: Special Thanks to Francisco Ingham for his suggestions as well. Francisco is also doing a Paper summary. Please check it out on his medium profileif you’re interested.

I’d also like to thank Alexandra Chronopoulou, First Author of this paper; for providing me clarifications about my questions from the paper.

This blog post is Part 2 in the 5-minute Papers series. In this post, I’ll give highlights from the Paper “An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models” by Alexandra Chronopoulou et al.

This paper discusses a few tricks and a simple approach to applying Transfer Learning from Pretrained models. They compare their approach with recent publications and claim competitive results with SOTA.

This post aims to discuss the context of the task, as well as 5 of the model details that the authors have shared which are “embarrassingly simpler” when compared to similar transfer learning approaches.

Context:

Transfer Learning in NLP involves using a pre-trained Language Model to capture information about a language and then fine-tune this model towards a goal task.

Why Transfer Learning?

Instead of training from scratch, Transfer Learning allows us to leverage a model that is already good at Language modeling and then use it for our Target task such as Sentiment Analysis.

Wait, What is Transfer Learning?

Transfer learning is a technique where we leverage a model that has been already trained to do a task. For ex: Classify Images into categories (ex: ImageNet), we then use the “Knowledge” that this model has captured and then “fine-tune” or apply it to our target task. For ex: Using the ImageNet model to become an expert in detecting HotDog or NotHotDog images

However, while this approach has shown much promise there is still room for improvements.

Issues Addressed:

A few of the shortcomings of current approaches that the authors have tried to address are:

Approach

The authors have introduced their approach as SiATL: Single-stepAuxillaryLoss Transfer

A Language Model is trained on 20 million English Twitter messages, which is then evaluated on 5 different datasets for a different task.

There are 5 techniques or nuggets that the authors have used for training which simplify the transfer learning approach while showing performance gains:

Results:

Here is the comparative study of the authors’ approach compared against 5 other approaches of Transfer Learning on NLP, across 5 datasets.

Source: Paper

The authors have also shared a comparative study of their approach Vs ULMFiT on different training dataset sizes, showing very promising results on smaller dataset sizes.

Conclusion & Personal thoughts:

The paper has discussed a few simple approaches and offered very competitive results against other Transfer Learning Techniques.

Especially, on smaller data-sub sets, this approach (as per the claim) even outperforms ULMFiT.

I’ve particularly liked the idea of defining an auxiliary loss. At the same time, doing Sequential Unfreezing: Something that I’ve learned in the fast.ai MOOC when working with Image tasks is another great idea highlighted in the paper. Intuitively, it might help with faster convergence.

On the contrary, I do not think that ULMFiT uses a “sophisticated” learning rate schedule, I think deploying the same along with the ideas in this paper would help with even faster convergence.

As a personal experiment, I’d definitely try out these approaches on smaller subsets of IMDB along with the one cycle scheduler from the fast.ai library.

If you found this interesting and would like to be a part of My Learning Path, you can find me on Twitter here.

If you’re interested in reading about Deep Learning and Computer Vision news, you can check out my newsletter here.

If you’re interested in reading a few best advice from Machine Learning Heroes: Practitioners, Researchers, and Kagglers. Please click here