In software engineering, a data pipeline is exactly what it suggests: a way for data to ‘flow’ from a source to a destination. Every data pipeline consists of 3 stages.

Extract phase: In this phase, data is extracted from the source. This could be hitting some API, reading files from a object storage (like AWS S3) or running queries on a database (like AWS RDS).

Transform phase: In this phase, transformations are applied to the data to massage it into a different format. This could be transformations at rest (like normalizing all documents in a data lake) or transformations in motion (like converting from one class to another).

Load phase: In this phase, the data is loaded into another storage. This could be a data warehouse, database or even a customer facing tool like a Tableau dashboard.

Every data pipeline begins with the Extract phase. But the order in which the other 2 phases occur divides data pipelines into 2 broad categories, which are explained below.

ETL Pipelines

In ETL pipelines, the order of operations is Extract, Transform and Load.

In ETL pipelines, data is extracted into from a single (or multiple) source(s), transformed on the fly using a set of business rules and loaded into a target repository. ETL pipelines usually move data into a relational store like a SQL database and this makes subsequent querying very fast.

ELT Pipelines

In ELT pipelines, the order of operations is Extract, Load and Transform.

In ELT pipeline, data is extracted into from a single (or multiple) source(s). This raw data is stored in a data lake. Transformations are applied when required before the data is consumed or server. ELT is a relatively modern phenomenon, made possible by better technology in recent times.

Advantages of ELT over ETL

ELT, though new to the field, has some advantages over ETL that make it really useful in some scenarios.

Conclusion

ETL and ELT pipelines have their own advantages and disadvantages. While ETL pipelines are often the first preference, ELT pipelines could very well be more advantageous to your particular use case. The final decision, of course, depends on details and specifications required from the data pipeline.