Introduction

Hi everyone! Today, I want to talk about time-series data management. We encountered this challenge while working on a project, and I will share our experience with TimescaleDB and highlight its outstanding features.

Understanding time-series data: characteristics and challenges

Time-series data is simply a sequence of data points collected over time. Think of it as measurements or events that have timestamps attached to them. It's append-only (we're mostly adding new data, not changing historical records), naturally ordered by time, and the time element itself is usually crucial for analysis.

Common examples of time-series data include: stock prices over time, weather measurements (temperature, humidity, wind speed), monthly subscriber counts on a website, sensor readings from IoT devices, etc.

In a real-world e-commerce project we recently built, our team encountered this type of data. We needed to track how many times product pages were viewed and how often users clicked on them. It was also necessary to record each product's daily position in search results since merchants paid for premium placements. This created a perfect time-series dataset–every day, we collected thousands of new records with timestamps, while the historical data remained unchanged. We used this data to show merchants evidence that higher positions actually led to more visibility and clicks.

Common challenges

When working with time-series data, there are several important challenges to consider:

This requires specialized approaches for optimal storage and querying. Large volumes of time-series data can overwhelm traditional relational databases, especially when they perform complex time-based aggregations.

Time-series databases

There are several specialized database systems that exist to address these challenges, such as InfluxDB, Prometheus, Apache Druid, MongoDB, and Amazon Timestream. Each has its strengths and works in different use cases; however, this article will focus on TimescaleDB.

TimescaleDB

TimescaleDB is a time-series database built as an extension to PostgreSQL with time-series optimizations. Unlike other solutions that require learning new query languages, TimescaleDB lets users continue using SQL.

Key features:

Below we'll see TimescaleDB in action with some real-world examples.

Practical example: Intel Lab sensor data

For this practical example, we're using sensor data from the Intel Berkeley Research Lab. This dataset contains millions of temperature, humidity, light, and voltage readings over a 2-month period from 54 sensors deployed throughout the lab. We chose this dataset because, despite its relatively small size (about 2.3 million readings), it has enough data to demonstrate the performance differences between PostgreSQL and TimescaleDB.

Setting up our environment

First, we created two tables to compare performance:

  1. A regular PostgreSQL table (sensor_data_postgres)

  2. A TimescaleDB hypertable (sensor_data_timescale)

Both tables have identical schemas and indexes:

-- Regular PostgreSQL table
CREATE TABLE sensor_data_postgres (
    time        TIMESTAMPTZ NOT NULL,
    epoch       INTEGER,
    sensor_id   INTEGER NOT NULL,
    temperature DOUBLE PRECISION,
    humidity    DOUBLE PRECISION,
    light       DOUBLE PRECISION,
    voltage     DOUBLE PRECISION
);

-- TimescaleDB table
CREATE TABLE sensor_data_timescale (
    time        TIMESTAMPTZ NOT NULL,
    epoch       INTEGER,
    sensor_id   INTEGER NOT NULL,
    temperature DOUBLE PRECISION,
    humidity    DOUBLE PRECISION,
    light       DOUBLE PRECISION,
    voltage     DOUBLE PRECISION
);

-- Convert to hypertable
SELECT create_hypertable('sensor_data_timescale', 'time');

Performance comparison

All performance tests were run on:

Note: Performance results may vary on different hardware configurations, but the overall tendency should remain consistent across systems.

We loaded 1,841,828 rows and ran identical queries on both tables. Let's look at each query type and its results:

  1. Full range select: Basic retrieval of all data within a time range

SELECT * FROM sensor_data_postgres 
WHERE time >= '2004-02-28 00:58:46' 
AND time <= '2004-04-05 11:02:32';

-- Same query for TimescaleDB table

Results:

Insights:

  1. Time-based aggregations: Grouping and analyzing data by time intervals

-- PostgreSQL (Daily aggregation)
SELECT 
    date_trunc('day', time) as day,
    sensor_id,
    COUNT(*) as readings,
    AVG(temperature) as avg_temp,
    stddev(temperature) as temp_stddev
FROM sensor_data_postgres
WHERE time >= '2004-02-28' AND time <= '2004-04-05'
GROUP BY day, sensor_id
HAVING stddev(temperature) > 2
ORDER BY day, sensor_id;

-- TimescaleDB (using time_bucket)
SELECT 
    time_bucket('1 day', time) as day,
    sensor_id,
    COUNT(*) as readings,
    AVG(temperature) as avg_temp,
    stddev(temperature) as temp_stddev
FROM sensor_data_timescale
WHERE time >= '2004-02-28' AND time <= '2004-04-05'
GROUP BY day, sensor_id
HAVING stddev(temperature) > 2
ORDER BY day, sensor_id;

Results:

Insights:

Advanced timescaleDB features: continuous aggregates

Think of continuous aggregates as a personal data assistant that works ahead of time. Instead of forcing a database to recalculate the same aggregations (like daily averages or hourly counts) every time someone views a dashboard, TimescaleDB does this work in advance and keeps it ready to serve instantly.

Imagine temperature data being tracked and updated every minute, but the dashboard only needs to show daily averages. Rather than scanning millions of raw data points each time the dashboard loads, continuous aggregates pre-calculate data and store the required results. They automatically update as new data arrives.

Let's compare a regular PostgreSQL view with a TimescaleDB continuous aggregate:

  1. PostgreSQL View vs TimescaleDB Continuous Aggregate

-- Regular PostgreSQL view (computed on every query)
CREATE VIEW pg_daily_sensor_stats AS
SELECT
    date_trunc('day', time) as day,
    sensor_id,
    AVG(temperature) as avg_temp,
    MIN(temperature) as min_temp,
    MAX(temperature) as max_temp,
    COUNT(*) as reading_count
FROM sensor_data_postgres
GROUP BY day, sensor_id;

-- TimescaleDB continuous aggregate (materialized and automatically refreshed)
CREATE MATERIALIZED VIEW ts_daily_sensor_stats
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 day', time) as day,
    sensor_id,
    AVG(temperature) as avg_temp,
    MIN(temperature) as min_temp,
    MAX(temperature) as max_temp,
    COUNT(*) as reading_count
FROM sensor_data_timescale
GROUP BY day, sensor_id;

-- Set up automatic refresh policy
SELECT add_continuous_aggregate_policy('ts_daily_sensor_stats',
    start_offset => INTERVAL '3 days',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

  1. Querying the aggregates

-- Query against PostgreSQL view (computed on demand)
SELECT * FROM pg_daily_sensor_stats
WHERE day >= '2004-03-01' AND day <= '2004-03-31'
ORDER BY day, sensor_id;

-- Query against TimescaleDB continuous aggregate (pre-computed)
SELECT * FROM ts_daily_sensor_stats
WHERE day >= '2004-03-01' AND day <= '2004-03-31'
ORDER BY day, sensor_id;

Results when querying a month of data:

Storage optimization:

TimescaleDB also saves storage costs while maintaining query performance. With our temperature sensor data, 54 sensors taking readings every minute for months, this means we will need extra storage based on time.

For newer TimescaleDB versions (v2.18.0+), Hypercore automatically handles the storage management. But for this example, we’ll take a look at a compression policy.

Setting up compression is simple with compression policies — just tell TimescaleDB to compress chunks older than a certain age (like 7 days), and it handles everything automatically in the background.

Both PostgreSQL and TimescaleDB tables start at similar sizes (around 209 MB) with our test dataset:

-- Check regular PostgreSQL table size
SELECT pg_total_relation_size('sensor_data_postgres')/1024/1024 as size_mb;
-- PostgreSQL size: 209 MB

-- Check hypertable size before compression
SELECT hypertable_size('sensor_data_timescale')/1024/1024 as size_mb;
-- TimescaleDB size before compression: 208 MB

-- Enable compression on table
ALTER TABLE sensor_data_timescale SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'sensor_id',
    timescaledb.compress_orderby = 'time'
);

-- Create compression policy (compress chunks older than 7 days)
SELECT add_compression_policy('sensor_data_timescale', INTERVAL '7 days');

-- Check hypertable size after compression
SELECT hypertable_size('sensor_data_timescale')/1024/1024 as size_mb;
-- TimescaleDB size after compression: 35 MB (83% reduction)

Insights:

Learn more about compression settings

Final thoughts

Our performance tests demonstrated TimescaleDB's advantages for time-series data management. While traditional PostgreSQL performs better for simple queries, TimescaleDB is a good choice for complex time-based operations with performance improvement for aggregation queries.

In our real-world e-commerce project, we successfully implemented TimescaleDB to track product page views, click-through rates, and search position rankings. This gave us the perfect solution for our expanding dataset—every day, we collected thousands of new timestamped records while keeping historical data intact. Using continuous aggregation, we were able to efficiently analyze how premium placements affected visibility and clicks across a 2-3 year period, providing merchants with evidence that higher positions actually increased engagement.

The storage benefits were significant, too. TimescaleDB's compression reduced our storage needs by about 30-40%. This led to real cost savings in our cloud bills. Our analytics platform became more responsive, and our budget healthier. These benefits matter even more when the situation involves large amounts of time-series data.

TimescaleDB is worth considering for work with growing time-series data that requires complex analysis. It offers powerful tools and accommodates SQL.