A new paper from researchers at Google DeepMind demonstrates that large language models like GPT-3 are not just adept at generating human-like text - they are also excellent general-purpose compressors. This means they can compress many types of data like text, images, and audio down to very small sizes, similar to specialized compression algorithms like gzip and PNG.

Why Should We Care About Compression?

Data compression is a fundamental capability in computing and AI. Compressing data means we can store and transmit it using less memory, disk space, and bandwidth. This saves costs and allows systems to scale.

But more importantly, good compression also indicates a deep understanding of the structure and patterns in data. To compress well, an algorithm needs to spot redundancies and exploit statistical regularities. So, compression capability acts as a benchmark for how much knowledge an AI system has learned.

The fact that huge natural language models can compress varied data types so efficiently has major implications:

How Was the Research Conducted?

The DeepMind researchers tested the compression capabilities of different-sized language models on three different 1GB datasets:

They compared the models against standard compression algorithms like PNG, JPEG, and FLAC, which are specialized for images, audio, etc.

The language models are compressed using arithmetic coding - a technique that turns a predictive model into a compressor. The more accurately a model can predict the next byte in a file, the better it can compress the data.

They tested three main types of language models:

Key Technical Findings

The experiments yielded several insightful results:

Key Implications

These findings have significant implications:

In summary, this research shows large language models have become adept general-purpose learners. Their exceptional compression capabilities demonstrate an expansive understanding of patterns in textual, visual, and audio data. There is still progress to be made, but these models show increasing competence as general systems for automating prediction and compression across modalities.


Also published here.