There are soooo many academic and industrial papers in the field of machine learning, natural language processing (NLP) and computer systems nowadays. And even in a single conference, it’s overwhelming. In this post, I’ll share one of the ways I do conference paper reading; I like to call it “paper blitz”.

In a paper blitz session, the main goal is to cover all the papers that I will find “interesting” at a really superficial level. And I really mean ALL. The general idea is to understand the “meta-trends” of the conference submissions and group papers under the same trend so that it gives a more holistic view of either the types of problems an approach can solve or the variations of an approach to solve a single problem type.

In a paper blitz session, the main goal is to cover all the papers that I will find “interesting” at a really superficial level. And I really mean ALL.

For each paper, the objective is to identify:

Before we continue, here’s a disclaimer. A paper blitz is not a usual paper reading session nor a deep/shallow dive into the papers. We might also mistakenly critic papers since we are doing a really superficial read on the papers. I’ll reiterate that the goal here is to recall how many papers we can cover in a short time and not the precision of how well we understand the papers. Additionally, I’ll recommend that we bookmark the papers that deserve deeper dive as a follow-up to the paper blitz.

A paper blitz is not a usual paper reading session nor a deep/shallow dive to the papers. We might also mistakenly critic papers since we are doing a really superficial read on the papers.

Filter the papers [15-20 mins]

The first act of the paper blitz is to filter out ~50 papers that we want to read in the blitz. This is usually the hardest part and there’s no easy way to do it and in this case. So, we resort to judging the paper by its title. Unsolicited advice to paper authors, make the paper title informative and interesting.

And now, はじめ…

From https://aclanthology.org/events/acl-2022/, I would normally go through every title one by one and copy and paste the title that I find interesting into a notepad. There will definitely be inherent bias to choose papers of your pet topics, famous authors or simply NLP friends you’ve yet to contact from the pre-covid days. Don’t fight the bias and just put them into your list, but if your whole list is made up of NLP friends’ paper, you either have too many friends or cut down on your list and force yourself to go through the anthology again. Iterate the process until your list is made up of ~50 paper.

I would strongly recommend that you resist the urge and not use CTR+F and also not pick up papers only based on a specific topic. The paper blitz would be more like a shopping discovery experience than an e-commerce search experience, think window-shopping scrolling through the e-commerce app and if you are at the conference physically while doing a paper blitz, think window shopping in a brick and mortar mall.

Here’s a pseudo-code for the filtering process of the paper blitz.

def is_numberish(paper_blitz, max_num=50, ish_margin=0.2):
    return max_num * (1-ish_margin) < len(paper_blitz) < max_num * (1+ish_margin)

paper_blitz = []

while is_numberish(num_paper, 50):
    for paper in acl_anthology:
        if select_title(paper):   # Use your own select_title func, as desired.
            paper_blitz.append(paper)

Drumrolls… ここに… ただ!!

For ACL 2022, here’s my personal paper blitz list. You can see there’s definitely a bias to my pet topics:

Categorize the papers [25-30 mins]

The filtered list still looks like a mental overload but that is the goal of the paper blitz, to cover as many as possible. The next step in the process is to categorize the papers, and how I usually do it is to first put a category on the first paper and then see if the second paper fits into the same category, if not create a second category, then iterate till the end of list, repeat the categorization process from the first and see if the papers need to be reshuffled across the categories, recur until desired.

Here’s another psuedo-code for the categorization process:

cat_to_paper = {}  # Categories to Papers mapping.

def categorize(paper):
    max_sim = 0   # Variable to keep the maximum similarity
    paper_category = None
    for cat in cat_to_paper:
        # cosine() and vectorize() are just illustrations proxy to our human brains.
        # Cosine is a similarity function, proxy to how our brain relate things.
        # Vectorize is a function to convert a paper into an abstract numerical vector.
        similarity = cosine(vectorize(paper), vectorize(cat))
        if similarity > max_sim:
            max_sim = similarity
            paper_category = cat
    return paper_category

# There's no fix satisfaction criteria, you'll have to come up with your own.
while is_satisfied(cat_to_paper):  
    for paper in paper_blitz:
        cat = categorize(paper)
        cat_to_paper[cat] = paper

Presto! ほら!

Here’s a categorized version of my filtered list from ACL 2022:

Multi-Word Expression / Compositionality

Low-Resource Language / Problems

Multilinguality / Crosslingual NLP

Unsupervised / Semi-Supervised

Datasets

Language Learning / Understanding Language

Machine Translation (MT) Tricks / MT Evaluation

Applications

Model Architectures / Optimization

Misc

The Actual Paper Reading!!

Before the actual reading let’s do some backward time management, because we have

Given a 4 hours blitz, we have ~1 hour on the filter and categorize process, we get 360 mins left for 50 papers. Thus, we have around 7 mins per paper.

But there must be a better way than to use 7 mins for each paper! No?

Yes, there is. We can make use of the paper categories to give us a little more leeway when we blitz through the paper. For example, we have 3 papers in the Multi-Word Expression / Compositionality topic, so we get 21 minutes.

Since they are the same topic, the time taken to read the Related Work or Previous Work sections can be collapsed since we don’t need to take much effort into reading that section in the second and third papers.

Since they are the same topic, the time read the “Related Work” or “Previous Work” sections can be collapsed since we don’t need to take much effort in reading that section in the second and third paper.

Most probably, you can also score bonus time if the approach the papers used are similar or they use the same “Transformer is All You Need” shiny hammer 🔨 .

Lets start with the “Multi-Words and Compositionality” Topic

First note when opening the pdf, the same author wrote the first two papers in the topic!

Starting with the “Can transformer be too compositional?” paper, first thing that you will notice is a common gestalt of an NLP paper from ACL, the existence of a figure or example on the top right in the second column on the first page.

Legend says, this top right column figure/example gestalt is commonly attributed to Percy Liang

Legend says, this top right column figure/example gestalt is commonly attributed to Percy Liang but I’m not too sure it’s true for every paper he authored or co-authored. For example:

And here’s the “summary” of the meta-trend for the three papers in this topic:

Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation.

Verna Dankers, Christopher Lucas and Ivan Titov.

The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study.

Verna Dankers, Elia Bruni and Dieuwke Hupkes.

Word-level Perturbation Considering Word Length and Compositional Subwords.

Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki


Intermission…

Let’s do a quick round-up on the first topic on compositionality and see how we did in the blitz, we took 12 + 8 mins for the first two papers that are related and a little longer (12 mins) for the third paper since it was sorta in a different track, it’s more likely to be classified under LM Tricks rather than MWE and Compositionality but that’s the result of “judging the paper by its title” =)

Up till now, we spend a total of 32 mins for 3 papers, it’s a little over-time from our 21 minutes estimate but possibly we’ve saved future time in reading the related works for the LM Tricks topic. I hope the exercise up till this point gives you some tips on how researchers blitz through multiple papers in a short time and how reading more papers grouped in some manner help save time.

Ready for another topic to blitz through? 行くぞ…

Alright, alright, alright! I’ll spare you the torment =)

You definitely don’t want to spend 3-4 hours reading through my summary and I would suggest you take the time to do your own paper blitz too. But in actual fact, I did sit down for another 5+ hours to complete my blitz but my “meta-trend” notes are not as clean as the ones presented here. TBH, I did kinda overtime but considering spending around 6 hours in total to read through ~50 papers, I think I deserve to chill out with a cold サッポロビール to end my night.

Summary

This article introduces a mechanism that I personally use to read many papers quickly so that I can catch up with the ever-growing number of papers published in the Natural Language Processing (NLP) field.

My personal joy in reading papers is always thinking about what new nuggets of knowledge I can get from just a handful of papers and somehow figuring out how to try these approaches in my work projects. And blitzing through papers has helped me identify these nuggets faster than painfully doing shallow or deep dives popular papers or a handful of papers.

I hope that paper blitz can help newcomers in the field bootstrap knowledge of the field and “state-of-the-art” trends and also seasoned researchers save some time in finding the interesting nuggets in a haystack of accepted publications.

Until next time, have fun blitzing through academic papers, がんばれ!

P/S: I would not recommend doing this regularly and I usually only do it before the conference starts or post-conference as soon as the conference ends, at most 2-3 times a year. I usually only do 1-3 paper blitz a year, sometimes *ACL conference, sometimes purely on WMT and MT workshops papers, sometimes my filtering sets starts with a whole mass of “googled” results.