sia.hackernoon.com

TimescaleDB is a PostgreSQL extension for high-performance real-time analytics on time-series and event data.

It is basically a custom Postgres.

Important: This post uses Terraform to create and destroy resources on an AWS account. Make sure that you destroy the resources as soon and as long as you don’t need them any more. Otherwise, you will be charged more than you might want.

How do you set up a TimescaleDB on AWS?

Unfortunately, you can’t use, at the time of this writing (Jul 2025), the managed AWS RDS Service. Neither the AWS RDS Custom Service.

If you want to use TimescaleDB you have some options by companies that offer it as a hosting service.

Set it Up Manually on your AWS Account

This requires that you are more competent in terms of devops, but it allows you more flexibility.

This is the option I am going to go with in this blog post.

Here is an architecture diagram for the demo project I will develop to demonstrate how I can set up TimescaleDB on AWS.

I will need access to the EC2 machine in order to install TimescaleDB Server. That’s why the SSH access on port 22. That’s the easiest way to install the software on the EC2 machine. In subsequent blog posts, I will present other ways to install third-party software like TimescaleDB without requiring ssh access.
I will need access to the database server via psql from a remote machine. That’s why I will need to open the port 5432. However, again, if it were a real production environment, it would make it more secure if direct access to the database was only possible from within the VPC and not from the public internet. This is another topic for another blog post.

TimescaleDB on my AWS Account

Project Setup

Source code here:

Third-party Tools

These are the tools and their version at the time of writing: (BTW, I am using asdf as my third-party tools version manager)

Terraform, version 1.12.2
direnv, version 2.35.0

Git - Don’t Go Without

I don’t go without git. I also create a .gitignore file. Initially, I have the following folders and files ignored:

.env
.env.*
.envrc
*.pem
*.pem.pub

terraform/.terraform
*.tfstate
*.tfstate.backup

Folder For Terraform Files

I like to create the folder terraform and put everything inside

So, all the Terraform related files will be inside this terraform sub-folder.

Also, when I will be telling you about shell commands that I execute, like terraform validate, they will be executed from within this sub-folder, which I will indicate with the terraform > prefix at the beginning.

Main Terraform Setup

The main.tf file declares the version of Terraform and AWS Provider I will work with.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "6.0.0"
    }
  }

  required_version = "~>1.12.0"
}

Backend To Save Terraform State

I will leave the default backend (local). However, you might want to use other backends in your own projects.

I define the backend inside the file backend.tf

terraform {
  backend "local" {
    path = "terraform.tfstate"
  }
}

Initialize

With the main.tf and backend.tf in place, I fire the terraform init.

terraform > terraform init

AWS Region

I will use Europe/Frankfurt, which is eu-central-1. I will set it as default value for the variable region inside terraform/variables.tf (file link here). But since this is declared as input variable, you can pass whichever value you want.

AWS Provider

I set up the AWS provider configuration as follows (inside the file providers.tf):

# Without +alias+ this is the default +aws+ provider
#
provider "aws" {
  region = var.region
  default_tags {
    tags = {
      project     = var.project
      terraform   = "1"
      environment = var.environment
      tf_repo     = var.repository_url
      tf_folder   = "terraform/${var.environment}"
    }
  }
}

This introduces the following three input variables:

project
environment
repository_url

which I have to declare inside the variables.tf file. I will also set some default values. Set your own values to match your settings:

variable "project" {
  description = "The name of the project."
  type        = string
  default     = "setting_up_timescaledb_on_aws"
}

variable "environment" {
  description = "The environment for the deployment (e.g., development, staging, production)."
  type        = string
  default     = "development"
}

variable "repository_url" {
  description = "The GitHub repository URL for the project."
  type        = string
  default     = "https://github.com/pmatsinopoulos/setting_up_timescaledb_on_aws"
}

VPC

I will use the default VPC in the region selected. There is always a default VPC. It has plenty of IP addresses (172.31.0.0/16 - 65,536 addresses).

Since the VPC already exists and I will not create it, I just need to create a data source reference. I will put that inside the data.tf file:

data "aws_vpc" "timescaledb_vpc" {
  id = var.vpc_id
}

I am using an input variable to reference the VPC by its id. Which means that I need the input variable definition inside the variables.tf file:

variable "vpc_id" {
  description = "The ID of the VPC where the EC2 instance will be launched in."
  type        = string
}

VPC - Subnet

The default VPC in each region is coming with 3 default subnets each one created in one of the Availability Zones:

eu-central-1a
eu-central-1b
eu-central-1c

I will use the subnet deployed into eu-central-1a. I will pass its id via a variable

variable "subnet_id" {
  description = "The ID of the subnet where the EC2 instance will be launched in."
  type        = string
}

I declare the data source for the subnet inside the data.tf file, in which I declare resources that I reference rather than create.

data "aws_subnet" "timescaledb_subnet" {
  id = var.subnet_id
}

SSH Access Security Group

I will need two security groups, one to allow incoming SSH traffic on port 22. And another one for postgres traffic on port 5432.

Here is how I set up the SSH one. I write its configuration inside the file security_groups.tf.

resource "aws_security_group" "ssh" {
  name        = "${var.project}-${var.environment}-security-group-ssh"
  description = "Allow SSH traffic from anywhere to anywhere"
  vpc_id      = data.aws_vpc.timescaledb_vpc.id
  ingress {
    description = "SSH from anywhere"
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  tags = {
    "Name" = "${var.project}-${var.environment}-security-group-ssh"
  }
}

This is opening the incoming traffic from any IP address over the port 22.

Postgres Access Security Group

In the same file, I create another security group that will allow me to send postgres traffic over the port 5432.

resource "aws_security_group" "postgres" {
  name        = "${var.project}-${var.environment}-security-group-postgres"
  description = "Allow postgres traffic from anywhere"
  vpc_id      = data.aws_vpc.timescaledb_vpc.id
  ingress {
    description = "postgres from anywhere"
    from_port   = 5432
    to_port     = 5432
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  tags = {
    "Name" = "${var.project}-${var.environment}-security-group-postgres"
  }
}

SSH Key Pair

I will need to create an ssh key pair (private-public) and upload the public part of it on AWS key pairs.

I use ssh-keygen to create the key pair. But you can use any method described here. Please, note that the EC2 instance that I will create to install the TimescaleDB server is going to be a Linux/Ubuntu server. Hence, I follow the instructions to create an ssh key pair that is compatible with Linux/Ubuntu.

This is how I generate the key on my Linux client:

ssh-keygen -m PEM -f timescaledb.pem

Note that when I am being asked for a passphrase, I leave it empty.

This generates two files locally:

timescaledb.pem
timescaledb.pem.pub

Important: The .gitignore file should include the entry *.pem to make sure the private part of the key is not checked in.

Important: I use the following command to change the access mode of the private key file. This will make sure that the file is only readable by me.

terraform > chmod 400 "timescaledb.pem"

Now I need to declare the resource so that the public key is uploaded to AWS key pairs. I do that inside the file ec2_key_pair.tf

# This imports the public part of an OpenSSH key.
# The key has been generated using `ssh-keygen` in PEM format.
# Example:
#
# ssh-keygen -m PEM
#
# The key has been generated without a passphrase.
#
resource "aws_key_pair" "timescaledb" {
  key_name   = "${var.project}-${var.environment}-timescaledb"
  public_key = file("${path.module}/timescaledb.pem.pub")

  tags = {
    "Name" = "${var.project}-${var.environment}-timescaledb"
  }
}

EC2 Instance

I am now ready to code for the EC2 instance resource that I will use to run my TimescaleDB server.

I write this inside the file ec2_instance.tf.

data "aws_ami" "timescaledb" {
  most_recent = true

  filter {
    name   = "block-device-mapping.delete-on-termination"
    values = ["true"]
  }

  filter {
    name   = "is-public"
    values = ["true"]
  }


  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server*"]
  }

  filter {
    name   = "root-device-type"
    values = ["ebs"]
  }

  owners = ["099720109477"] # Canonical
}

resource "aws_instance" "timescaledb" {
  ami               = data.aws_ami.timescaledb.id
  instance_type     = var.timescaledb_server_instance_type
  availability_zone = "${var.region}${var.timescaledb_server_availability_zone}"
  subnet_id         = var.subnet_id
  vpc_security_group_ids = [
    data.aws_security_group.timescaledb_vpc_default_security_group.id,
    aws_security_group.ssh.id,
    aws_security_group.postgres.id,
  ]
  key_name                    = aws_key_pair.timescaledb.key_name
  associate_public_ip_address = true

  tags = {
    "Name" = "${var.project}-${var.environment}-timescaledb"
  }
}

output "ec2_timescaledb_public_ip" {
  value       = aws_instance.timescaledb.public_ip
  description = "The public IP address of the EC2 Timescale DB Machine"
}

output "ssh_connect" {
  value       = "ssh -i ${path.module}/timescaledb.pem -o IdentitiesOnly=yes ubuntu@${aws_instance.timescaledb.public_ip}"
  description = "SSH command to connect to the Timescale DB EC2 instance"
}

The data “aws_ami” “timescaledb” { … } block is used to select the AMI that will be used to launch the EC2 instance described in the following resource “aws_instance” “timescaledb” { … }

You can see how the resource block is referencing the data block:

...
ami               = data.aws_ami.timescaledb.id
...

Inside the resource block, I am setting the very minimum properties so that can I launch the EC2 instance as I want it.

Outputs:

I also make sure that I export two outputs:

ec2_timescaledb_public_ip: the public IP of the EC2 machine.
ssh_connect: the command to use locally if I want to SSH to the EC2 machine.

Values to `terraform` Input Variables

Most of the terraform commands that I will need in this project will require me to have a method to pass values to the input variables that they don’t have a default value.

There are plenty of ways you can pass values to Terraform input variables. For this particular demo, I prefer the method of environment variables.

So, I make sure that the terminal shell that I use to execute the terraform commands is set with correct values for the environment variables that terraform needs.

The tool that is helping me do that is direnv. I have created the terraform/.envrc file with the following contents:

See terraform/.envrc.sample and create your own terraform/.envrc.

export AWS_PROFILE=...<put here the name of the AWS profile you have configured>...
export TF_VAR_subnet_id=...<put here the sunet id from your VPC>...
export TF_VAR_vpc_id=...<put here the id of your VPC>...

Note that terraform/.envrc is being git ignored and it is not checked-in. This is to avoid checking in sensitive secret information. I have the file terraform/.envrc.sample which is checked-in and is a template to tell the reader how the actual terraform/.envrc file should be structured.

`AWS_PROFILE`

The aws provider needs a way to connect to my AWS account. One way that I like very much to use is AWS CLI Profiles. And the AWS_PROFILE is holding the name of the profile that I have configured on my local machine and that I want to use for this project.

If you want to use this method, follow the instructions here and here.

Checkpoint 1

I will now double-check that everything works OK up until this point.

I run

terraform > terraform validate

It needs to print

Success! The configuration is valid.

And then I will actually create the resources.

Important! This is the point where costs will be incurred in your AWS Account.

terraform > terraform apply

I am being presented with the plan and prompt to reply with yes, if I want to apply the plan:

Terraform will perform the following actions:

  # aws_instance.timescaledb will be created
  + resource "aws_instance" "timescaledb" {
      ...
    }

  # aws_key_pair.timescaledb will be created
  + resource "aws_key_pair" "timescaledb" {
      ...
    }

  # aws_security_group.postgres will be created
  + resource "aws_security_group" "postgres" {
      ...
    }

  # aws_security_group.ssh will be created
  + resource "aws_security_group" "ssh" {
      ...
    }

Plan: 4 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + ec2_timescaledb_public_ip = (known after apply)
  + ssh_connect               = (known after apply)

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value:

I reply with yes and it is taking about 2 minutes for everything to be ready.

Then I use the AWS Management Console to check upon the state of the new EC2 Instance. I am waiting for it to become Running:

Then I use the ssh_connect output to connect to the instance using ssh command. From the terraform folder again, I execute the command: (The shell first executes the command terraform output ssh_connect and then takes its output and executes again).

terraform > $(terraform output -raw ssh_connect)

After I reply with yes to save the remote key to known hosts, I am inside the EC2 machine with an ssh shell:

Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-1030-aws x86_64)

 ...

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@ip-172-31-25-169:~$

EBS Volume - Data Storage

So far, so good.

But the EC2 instance doesn’t have TimescaleDB installed.

Also, it doesn’t have a specific instance-independent storage to save the actual database data.

I want:

Every time the EC2 instance is created (or recreated) to have the TimescaleDB installed AND
Make sure that, if the EC2 instance is created (or recreated), the TimescaleDB data created from previous instance runs, it survives and continues to exist; i.e. I want make sure that I don’t lose my data every time the EC2 instance, for any reason, it is destroyed and then created from the beginning.

Outline Of Steps

In order to satisfy these requirements:

I will create the AWS EBS volume (using the resource ”aws_ebs_volume”), then
I will attach it to the EC2 instance (using the resource ”aws_volume_attachment“, then
I will automatically connect to the EC2 instance and
- create a file system device to represent the attached EBS volume
- create the /data directory and mount it to the device created
- install Postgres and configure TimescaleDB extension
- configure Postgres to use the /data folder for storing data.

AWS EBS Volume Resource

This is the resource configuration for the AWS EBS volume:

file: terraform/ec2_volume.tf

resource "aws_ebs_volume" "timescaledb_volume" {
  availability_zone = "${var.region}${var.timescaledb_server_availability_zone}"
  size              = 64
  type              = "gp3"
  encrypted         = false
  final_snapshot    = false

  tags = {
    Name = "${var.project}-${var.environment}-timescaledb-volume"
  }
}

Note: the final_snapshot has been set to false. You may want to turn this to true to allow for AWS to create a snapshot when you destroy this volume, just in case you want to recover old data. But setting to true will incur costs.

This is quite minimum and many values are hard-coded. For example, the size is set to 64 GiB. And the type is set to gp3. Apparently, go ahead and set your own values.

If I terraform apply, Terraform is going to create an AWS EBS Volume which will be standing alone, not attached to any EC2 instance.

Attach New Volume to EC2 Instance

This is how I am attaching the new volume to the EC2 instance:

file: terraform/ec2_volume.tf

resource "aws_volume_attachment" "timescaledb_volume_attachment" {
  device_name = "/dev/sdd"
  volume_id   = aws_ebs_volume.timescaledb_volume.id
  instance_id = aws_instance.timescaledb.id
}

It’s quite simple, isn’t it.

But then, one might ask, how did I come up with the value ”/dev/sdd” for the device_name?

Explain /dev/sdd

If I look at the details of the AMI that I use, I get this picture with regards to block devices that come with it:

There are 3 devices that AMI offers to EC2 instances launched with this AMI. Only one of the 3 devices, the one with name /dev/sda1 , is used as the 8GiB root device. This is where the EC2 instance is boot from. The devices /dev/sdb and /dev/sdc are ephemeral virtual devices which are not yet used by the EC2 Instance.

This is another picture (taken from AWS documentation) of what’s going on with storage in this EC2 instance:

And if I look at the “Storage” tab on the instance details, I will see this:

So, the EC2 instance uses the /dev/sda1 device.

But how do I finally choose the name /dev/sdd for the new volume that I am attaching? The AWS documentation on device naming has the details.

According to the documentation and the AMI virtualization type I am here:

Since /dev/sdb and /dev/sdc are already taken , I am going with /dev/sdd.

Note that I am not interested in mounting the /dev/sdb and /dev/sdc ephemeral devices. So, I just leave them unmounted.

Hence, after I terraform apply the new resource timescaledb_volume_attachment, the “Storage” tab on the instance picture will be:

I can see the new volume attached using the device name /dev/sdd.

Prepare To Store Data

But even if the EBS volume is attached to the EC2 machine, this doesn’t mean that it is ready to write data on it.

If I ssh to the new EC2 instance and do the following command:

$ lsblk

I will see something like this:

NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0          7:0    0 27.2M  1 loop /snap/amazon-ssm-agent/11320
loop1          7:1    0 73.9M  1 loop /snap/core22/2010
loop2          7:2    0 50.9M  1 loop /snap/snapd/24718
nvme0n1      259:0    0    8G  0 disk 
├─nvme0n1p1  259:1    0    7G  0 part /
├─nvme0n1p14 259:2    0    4M  0 part 
├─nvme0n1p15 259:3    0  106M  0 part /boot/efi
└─nvme0n1p16 259:4    0  913M  0 part /boot
nvme1n1      259:5    0   64G  0 disk

The last entry nvme1n1 of size 64G corresponds to the newly attached AWS EBS volume. But, it doesn’t have any partition (like the nvme0n1 does for example).

I need to format and create the logical partitions. Then we will have to mount a directory on it.

I have several ways to do that with Terraform. I chose to go with terraform_data resource.

I create the file prepare_ebs_volume.tf with the following content:

file: terraform/prepare_ebs_volume.tf

resource "terraform_data" "prepare_ebs_volume_for_writing" {
  depends_on = [aws_volume_attachment.timescaledb_volume_attachment]

  triggers_replace = {
    volume_attachment = aws_volume_attachment.timescaledb_volume_attachment.id
  }

  connection {
    type        = "ssh"
    user        = "ubuntu"
    host        = aws_instance.timescaledb.public_ip
    private_key = file("${path.module}/timescaledb.pem")
  }

  provisioner "remote-exec" {
    inline = concat(
      [
        "sudo file -s /dev/nvme1n1 | grep -q '/dev/nvme1n1: data$' && sudo mkfs -t xfs /dev/nvme1n1",
        "sudo mkdir /data",
        "sudo mount /dev/nvme1n1 /data",
        "sudo cp /etc/fstab /etc/fstab.bak",
        "echo \"UUID=$(sudo blkid -s UUID -o value /dev/nvme1n1) /data xfs defaults,nofail 0 2\" | sudo tee -a /etc/fstab"
      ]
    )
  }
}

the depends_on is critical in order to make sure that resources are created in correct order, but
the triggers_replace is critical to trigger reconstructions/recreations when the resource already exists. And for this particular case, if the EBS volume is reattached (see for example later on the case in which EC2 instance is destroyed and recreated again), we want this provisioner to fire again.
mkfs creates a files system of type xfs on the new volume. Note, that I call mkfs only if there is not file system already on the volume, in order to avoid overwriting any existing data.
mkdir creates the /data directory. This is where the Postgres/TimescaleDB will be writing data. See later on.
mount mounts the volume to the directory /data.
the echo…. updates the /etc/fstab file so that when the instance is ever rebooted to have the volume mounted to /data automatically.

Note: If you want to read more about this process read the official AWS documentation here.

I do terraform apply and then I ssh to the EC2 instance.

The command lsblk now prints:

NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0          7:0    0 27.2M  1 loop /snap/amazon-ssm-agent/11320
loop1          7:1    0 73.9M  1 loop /snap/core22/2010
loop2          7:2    0 50.9M  1 loop /snap/snapd/24718
nvme0n1      259:0    0    8G  0 disk 
├─nvme0n1p1  259:1    0    7G  0 part /
├─nvme0n1p14 259:2    0    4M  0 part 
├─nvme0n1p15 259:3    0  106M  0 part /boot/efi
└─nvme0n1p16 259:4    0  913M  0 part /boot
nvme1n1      259:5    0   64G  0 disk /data

Look how the nvme1n1. Instead of just disk it shows /data.

Bingo! If you go back and read the Outline of Steps above, you will see that I have taken these two steps already:

create a file system device to represent the attached EBS volume
create the /data directory and mount it to the device created

Next two steps:

Install Postgres and TimescaleDB extension

I will now add two more terraform_data blocks that will take care of installing Postgres and TimescaleDB extension and configuring it to store data into the /data folder (which now resides in the new EBS volume).

I name the file install_and_setup_timescaledb.tf.

file: terraform/install_and_setup_timescaledb.tf.

locals {
  db_name                   = "events_server_${var.environment}"
  path_to_postgres_data_dir = "/data/postgresql/${var.postgresql_version}/main"
}

resource "terraform_data" "install_and_setup_timescaledb" {
  depends_on = [terraform_data.prepare_ebs_volume_for_writing]

  triggers_replace = {
    volume_attachment  = aws_volume_attachment.timescaledb_volume_attachment.id
    postgresql_version = var.postgresql_version
  }

  connection {
    type        = "ssh"
    user        = "ubuntu"
    host        = aws_instance.timescaledb.public_ip
    private_key = file("${path.module}/timescaledb.pem")
  }

  provisioner "file" {
    source      = "${path.module}/install_postgres.sh"
    destination = "/home/ubuntu/install_postgres.sh"
  }

  provisioner "file" {
    source      = "${path.module}/install_timescaledb.sh"
    destination = "/home/ubuntu/install_timescaledb.sh"
  }

  provisioner "remote-exec" {
    inline = concat(
      [
        "sudo chmod u+x /home/ubuntu/install_postgres.sh",
        "sudo ./install_postgres.sh ${var.postgresql_version} ${var.timescaledb_server_port}  ${local.db_name} ${var.timescaledb_version}",

        "sudo chmod u+x /home/ubuntu/install_timescaledb.sh",
        "sudo ./install_timescaledb.sh ${var.postgresql_version} ${var.timescaledb_version} ${local.db_name}"
      ]
    )
  }
}

resource "terraform_data" "postgres_password" {
  depends_on = [terraform_data.install_and_setup_timescaledb]

  triggers_replace = {
    volume_attachment  = aws_volume_attachment.timescaledb_volume_attachment.id
    postgresql_version = var.postgresql_version
  }

  connection {
    type        = "ssh"
    user        = "ubuntu"
    host        = aws_instance.timescaledb.public_ip
    private_key = file("${path.module}/timescaledb.pem")
  }

  provisioner "remote-exec" {
    inline = concat(
      [
        "echo '**************** remote-exec: Setting the postgres user password...'",
        "sudo -u postgres psql -c \"ALTER USER postgres WITH PASSWORD '${var.timescaledb_server_postgres_password}';\""
      ]
    )
  }
}

output "psql_connect" {
  value       = "psql -h ${aws_instance.timescaledb.public_ip} -U postgres -d ${local.db_name} -p ${var.timescaledb_server_port}"
  description = "Command to connect to the PostgreSQL database using psql."
}

Dependency: I put a dependency of the first resource to the previous one, i.e. of the install_and_setup_timescaledb to the terraform_data.prepare_ebs_volume_for_writing. Hence, the remote-exec calls are going to be executed after that block’s commands do.

Triggers Replace: However, besides the order of creation which is specified with the dependends_on block, I also use the triggers_replace block to specify that I want this block to be recreated under these two conditions:

Every time the attachment of the EBS volume is recreated.
Every time the version of the Postgres server changes.

Two bash Scripts

I use two bash scripts:

Bash script to install Postgres: install_postgres.sh.
Bash script to install TimescaleDB: install_timescaledb.sh.

Note that both scripts are also called when upgrading Postgres version.

Installing Postgres

The file install_postgres.sh has the following contents:

file terraform/install_postgres.sh.

#!/bin/bash

set -e # e: exit if any command has a non-zero exit status
set -x # x: all executed commands are printed to the terminal
set -u # u: all references to variables that have not been previously defined cause an error

NEW_VERSION=$1
PORT=$2
DB_NAME=$3
TIMESCALEDB_VERSION="${4}*"

# Create the PostgreSQL data directory
sudo mkdir -p /data/postgresql

# Update the package list
sudo DEBIAN_FRONTEND=noninteractive apt update -y

# Install postgres common tools
sudo DEBIAN_FRONTEND=noninteractive apt install -y postgresql-common apt-transport-https net-tools

# Enable the PostgreSQL APT repository
sudo DEBIAN_FRONTEND=noninteractive /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh -y

# Let's now install on top of any existing
CURRENT_CLUSTER=$(sudo pg_lsclusters -h 2>/dev/null | grep "${PORT}" | awk '{print $1"-"$2}' || true)

INSTALL="yes"
OLD_VERSION=""
OLD_NAME="main"
if [ -n "$CURRENT_CLUSTER" ]; then
  echo "Found existing PostgreSQL cluster:"
  echo "$CURRENT_CLUSTER"
  echo

  OLD_VERSION=$(echo $CURRENT_CLUSTER | cut -d'-' -f1)
  OLD_NAME=$(echo $CURRENT_CLUSTER | cut -d'-' -f2)

  # Skip if this is the same version and name we're about to install
  if [ "$OLD_VERSION" = "$NEW_VERSION" ] && [ "$OLD_NAME" = "main" ]; then
      echo "Skipping cluster $OLD_VERSION/$OLD_NAME as it matches the target version and name"
      INSTALL="no"
      continue
  fi

  echo "************************ stopping and disabling $OLD_VERSION/$OLD_NAME ************************"
  sudo systemctl stop postgresql@$OLD_VERSION-$OLD_NAME || true
  sudo systemctl status postgresql@$OLD_VERSION-$OLD_NAME --no-pager || true
  sudo systemctl disable postgresql@$OLD_VERSION-$OLD_NAME || true
  sudo pg_ctlcluster stop $OLD_VERSION $OLD_NAME || true
  sudo pg_lsclusters -h
else
  echo "No existing PostgreSQL clusters found on port ${PORT}. Proceeding with installation."
  echo

  INSTALL="yes"
fi

echo "****************** INSTALL: $INSTALL"
if [ "$INSTALL" = "yes" ]; then
  sudo DEBIAN_FRONTEND=noninteractive apt install -y postgresql-$NEW_VERSION postgresql-client-$NEW_VERSION postgresql-contrib-$NEW_VERSION
  sudo DEBIAN_FRONTEND=noninteractive apt install -y postgresql-server-dev-$NEW_VERSION
  # When I install postgresql for the first time, the cluster is already created.
  # But when I install a new version while another already exists, the cluster is not created.
  if sudo pg_lsclusters -h 2>/dev/null | grep -q "^${NEW_VERSION}[[:space:]]\+main[[:space:]]"; then
    echo "Cluster $NEW_VERSION/main already exists"
  else
    echo "Creating cluster $NEW_VERSION/main"
    sudo pg_createcluster $NEW_VERSION main
    sudo pg_ctlcluster start $NEW_VERSION main
    sudo pg_lsclusters -h
  fi
  sudo systemctl start postgresql@${NEW_VERSION}-main
  sudo systemctl enable postgresql@$NEW_VERSION-main
else
  echo "Skipping installation of PostgreSQL ${NEW_VERSION} as it is already installed."
fi

# Show final status

echo "Current PostgreSQL clusters:"
sudo pg_lsclusters -h

sudo systemctl status postgresql@${NEW_VERSION}-main --no-pager

if [ -n "$OLD_VERSION" ]; then
  echo "Stopping and disabling old PostgreSQL cluster $OLD_VERSION/$OLD_NAME"
  sudo pg_ctlcluster stop $OLD_VERSION $OLD_NAME || true
else
  echo "No old PostgreSQL cluster to stop."
fi

# Stop postgres from running
sudo systemctl stop postgresql@${NEW_VERSION}-main

# Change where PostgreSQL stores its data
ORIGINAL_DATA_DIR="/var/lib/postgresql/${NEW_VERSION}/main"

if [ -d "${ORIGINAL_DATA_DIR}" ];then
  if [ -d "/var/lib/postgresql/${NEW_VERSION}/main.bak" ]; then
    rm -f -R /var/lib/postgresql/${NEW_VERSION}/main.bak
  fi
  sudo mv ${ORIGINAL_DATA_DIR} /var/lib/postgresql/${NEW_VERSION}/main.bak
fi

NEW_PATH_TO_POSTGRES_DATA_DIR="/data/postgresql/${NEW_VERSION}/main"

sudo mkdir -p ${NEW_PATH_TO_POSTGRES_DATA_DIR}
sudo chown -R postgres:postgres /data/postgresql
if [ -n "$(sudo ls -A ${NEW_PATH_TO_POSTGRES_DATA_DIR} 2>/dev/null)" ]; then
  echo "The new PostgreSQL data directory is not empty. We will not initialize it."
else
  echo "Initializing new PostgreSQL data directory at ${NEW_PATH_TO_POSTGRES_DATA_DIR}"
  sudo -u postgres /usr/lib/postgresql/${NEW_VERSION}/bin/initdb -D ${NEW_PATH_TO_POSTGRES_DATA_DIR}
fi
sudo sed -i "s|data_directory = '${ORIGINAL_DATA_DIR}'|data_directory = '${NEW_PATH_TO_POSTGRES_DATA_DIR}'|g" /etc/postgresql/${NEW_VERSION}/main/postgresql.conf

# Set the port to whatever we specify as port in the terraform variables
sudo sed -i "s|port = .*|port = ${PORT}|g" /etc/postgresql/${NEW_VERSION}/main/postgresql.conf

# Allow remote connections
sudo sed -i "s|#listen_addresses = 'localhost'|listen_addresses = '*'|g" /etc/postgresql/${NEW_VERSION}/main/postgresql.conf
sudo sed -i "s|host    all             all             127.0.0.1/32            scram-sha-256|host    all             all             0.0.0.0/0            scram-sha-256|g" /etc/postgresql/${NEW_VERSION}/main/pg_hba.conf

# upgrade from previous version if needed
LAST_CLUSTER=$(echo "$CURRENT_CLUSTER" | tail -n 1)
if [ -n "$LAST_CLUSTER" ]; then
    OLD_VERSION=$(echo $LAST_CLUSTER | cut -d'-' -f1)
    OLD_NAME=$(echo $LAST_CLUSTER | cut -d'-' -f2)
    if [ "$OLD_VERSION" = "$NEW_VERSION" ] && [ "$OLD_NAME" = "main" ]; then
      echo "...no need to upgrade data, we are on the same cluster version and name"
    else
      echo "We need to upgrade the data of the last cluster ${OLD_VERSION}-${OLD_NAME}"
      # We will need to install timescale db for the new version, otherwise the pg_upgrade will fail
      # ---------------------------------------------------------------------------------------------
      sudo apt install timescaledb-2-postgresql-${NEW_VERSION}="${TIMESCALEDB_VERSION}" timescaledb-2-loader-postgresql-${NEW_VERSION}="${TIMESCALEDB_VERSION}" timescaledb-toolkit-postgresql-${NEW_VERSION} -y

      sudo sed -i "s|#shared_preload_libraries = ''|shared_preload_libraries = 'timescaledb'|g" /data/postgresql/${NEW_VERSION}/main/postgresql.conf

      # Tune TimescaleDB
      sudo timescaledb-tune --yes
      # --------------- end of installing timescale db for the new version -----------------------------

      (cd /tmp && sudo -u postgres /usr/lib/postgresql/${NEW_VERSION}/bin/pg_upgrade \
        --old-datadir=/data/postgresql/$OLD_VERSION/$OLD_NAME \
        --new-datadir=/data/postgresql/$NEW_VERSION/main \
        --old-bindir=/usr/lib/postgresql/$OLD_VERSION/bin \
        --new-bindir=/usr/lib/postgresql/$NEW_VERSION/bin)
    fi
fi

# Restart PostgreSQL to apply changes
sudo systemctl restart postgresql@${NEW_VERSION}-main

# Wait for PostgreSQL to start
sleep 10

# Create the database
sudo -u postgres psql -c "create database ${DB_NAME};" || echo "Database ${DB_NAME} already exists, skipping creation."

I believe that the comments in-between the lines of bash script statements are enough to understand what is going on.

Note that the script makes sure that:

it can be used to upgrade from one Postgres version to another. Downgrades don’t work though.
It makes sure that it upgrades the data too.

Installing TimescaleDB

The file install_timescaledb.sh has the following contents:

file: terraform/install_timescaledb.sh.

#!/bin/bash

set -e # e: exit if any command has a non-zero exit status
set -x # x: all executed commands are printed to the terminal
set -u # u: all references to variables that have not been previously defined cause an error

POSTGRESQL_VERSION=$1
TIMESCALEDB_VERSION="${2}*"
DB_NAME=$3
# Set TimescaleDB repository at system repositories so that we can install TimescaleDB
echo "deb https://packagecloud.io/timescale/timescaledb/ubuntu/ $(lsb_release -c -s) main" | sudo tee /etc/apt/sources.list.d/timescaledb.list
wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | sudo gpg --dearmor --yes -o /etc/apt/trusted.gpg.d/timescaledb.gpg
sudo DEBIAN_FRONTEND=noninteractive apt update -y

sudo systemctl stop postgresql@${POSTGRESQL_VERSION}-main || true

# Install TimescaleDB and TimescaleDB Toolkit all TimescaleDB postgres related packages
sudo apt install timescaledb-2-postgresql-${POSTGRESQL_VERSION}="${TIMESCALEDB_VERSION}" timescaledb-2-loader-postgresql-${POSTGRESQL_VERSION}="${TIMESCALEDB_VERSION}" timescaledb-toolkit-postgresql-${POSTGRESQL_VERSION} -y

sudo sed -i "s|#shared_preload_libraries = ''|shared_preload_libraries = 'timescaledb'|g" /data/postgresql/${POSTGRESQL_VERSION}/main/postgresql.conf

# Tune TimescaleDB
sudo timescaledb-tune --yes

sudo systemctl restart postgresql@${POSTGRESQL_VERSION}-main

# Create the extensions for TimescaleDB and TimescaleDB Toolkit
sudo -u postgres psql -d ${DB_NAME} -c "CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;"
sudo -u postgres psql -d ${DB_NAME} -c "CREATE EXTENSION IF NOT EXISTS timescaledb_toolkit CASCADE;"

Again, this script is idempotent, and allows for upgrades of the timescale db.

It installs the TimescaleDB extension and the TimescaleDB Toolkit.

Also, it sets some common default values for TimescaleDB configuration.

Setting the Password

The second terraform_data resource is about setting the password for postgres user. I have it in a separate block, because it is using sensitive data and if I put it on the first block, Terraform would hide all output from that block (Postgres installation and setup commands) which I wouldn’t want.

Important! This scripts introduces more variables. My variables.tf needs to have these too:

variable "postgresql_version" {
  description = "The PostgreSQL version to install on the TimescaleDB server."
  type        = string
  default     = "16"
}

variable "timescaledb_version" {
  description = "The TimescaleDB version to install on the TimescaleDB server."
  type        = string
  default     = "2.19.3"
}

variable "timescaledb_server_port" {
  description = "Port for the TimescaleDB server."
  type        = number
  default     = 5432
}

variable "timescaledb_server_postgres_password" {
  description = "The password for the PostgreSQL user on the TimescaleDB server."
  type        = string
  sensitive   = true
}

Note: At the time of this writing, the latest Postgres version was 17, but, in the file above I choose to install version 16, to start with. Later in the post below, you will see how I choose to upgrade to 17, in order to demonstrate the Postgres upgrade process.

Important! Look also at the .envrc file (or other file that you keep your environment variables that set values to the Terraform input variables), you need to add the TF_VAR_timescaledb_server_postgres_password=…

...
export TF_VAR_timescaledb_server_postgres_password='...<put your postgres password here>...'

If I do terraform apply, at the end, I have my TimescaleDB ready.

I use this to connect (from terraform folder):

$ (terraform output -raw psql_connect)

And then I key in the postgres user password and I am on a psql terminal.

Look at the output of some commands.

Check that Extensions are Installed

I connect to the events_server_development database and I issue the command to get the extensions installed:

postgres=# \c events_server_development 
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
You are now connected to database "events_server_development" as user "postgres".
events_server_development=# select * from pg_extension;
  oid  |       extname       | extowner | extnamespace | extrelocatable | extversion |                                                                                 extconfig                                                                                 |                                                              extcondition                                                               
-------+---------------------+----------+--------------+----------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------
 13610 | plpgsql             |       10 |           11 | f              | 1.0        |                                                                                                                                                                           | 
 17140 | timescaledb_toolkit |       10 |         2200 | f              | 1.21.0     |                                                                                                                                                                           | 
 16385 | timescaledb         |       10 |         2200 | f              | 2.19.3     | {16406,16407,16429,16443,16442,16462,16461,16477,16476,16502,16518,16519,16536,16535,16555,16556,16611,16624,16651,16664,16674,16684,16688,16704,16715,16731,16740,16739} | {"","WHERE id >= 1","","","","","","","","","","","","","","WHERE id >= 1000"," WHERE key <> 'uuid' ","","","","","","","","","","",""}
(3 rows)

Creating a Table and A Row

I am creating a table and inserting a row.

postgres=# \c events_server_development 
psql (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1), server 17.5 (Ubuntu 17.5-1.pgdg24.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
You are now connected to database "events_server_development" as user "postgres".
events_server_development=# create table events (id bigint not null, type varchar not null, created_at timestamp default curren
t_timestamp not null, primary key(id));
CREATE TABLE
Time: 37,602 ms
events_server_development=# insert into events (id, type) values (1, 'product_sold');
INSERT 0 1
Time: 31,945 ms
events_server_development=# select * from e  vents;
 id |     type     |         created_at         
----+--------------+----------------------------
  1 | product_sold | 2025-07-06 12:37:03.418051
(1 row)

Time: 31,053 ms

So, I have created the table events and I have added 1 row into it.

Let’s now see some scenarios under which we would like to keep our data intact.

What if EC2 Instance is Upgraded?

The current EC2 instance type is t3.xlarge. What If I want to scale up to t3.2xlarge for example.

I update the default value of the variable timescaledb_server_instance_type to t3.2xlarge and I do terraform apply.

The plan presented is very simple:

  # aws_instance.timescaledb will be updated in-place
  ~ resource "aws_instance" "timescaledb" {
        id                                   = "i-069be2f31daf52118"
      ~ instance_type                        = "t3.xlarge" -> "t3.2xlarge"
      ~ public_dns                           = "ec2-63-178-242-62.eu-central-1.compute.amazonaws.com" -> (known after apply)
      ~ public_ip                            = "63.178.242.62" -> (known after apply)
        tags                                 = {
            "Name" = "setting_up_timescaledb_on_aws-development-timescaledb"
        }
        # (35 unchanged attributes hidden)

        # (9 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Only a change of the instance type.

I apply and then I connect to TimescaleDB with psql. It works. My data are still there.

What if EC2 Instance is Replaced?

I do terraform taint aws_instance.timescaledb to mark the EC2 instance as something that needs to be destroyed and created again by Terraform.

Then I do terraform apply. I see that I am being presented with a plan to

destroy and create the EC2 instance
destroy and recreate the EBS volume attachment.
destroy and recreate the terraform_data prepare_ebs_volume_for_writing.
destroy and recreate the terraform_data install_and_setup_timescaledb.

It seems that this is exactly what I want. New EC2 instance and the EBS volume to be detached from the old and attached to the new. Plus, installation of postgres on the new machine.

I proceed.

Then I test that I can connect with psql and that I can find my data in place.

Boom! Data there! All good!

What if the EBS Volume is Scaled Up?

I go to the ec2_volume.tf file and I increase the storage size from 64 to 128.

Then I do terraform apply.

I am being presented with a simple plan that the EBS volume will be updated in-place.

I then connect to the database and I can still see my data intact.

What If Postgres Version is Upgraded?

I change the default value of the postgresql_version variable from 16 to 17. Thanks to the triggers_replace of the resources install_and_setup_timescaledb and postgres_password, these resources will be recreated/replaced.

After I do terraform apply I then connect to the Postgres server. I can still see my events table and events record.

What If TimescaleDB Version is Upgraded?

This is not supported by the current script. Upgrading the TimescaleDB minor and/or major version is a quite involved process which is described here: Upgrade TimescaleDB.

What Could Be Improved?

This setup is good to start, but it has plenty of room for improvement in terms of production readiness

One can implement a read-replica of the main database. The read-replica could be used to scale the server horizontally, by serving the read requests.
Automated backups and quick restore from backups should be set up.
Cloudwatch alarms should be implemented on
- CPU
- Memory
- Free Storage
Allow direct access only from within VPC.
Implement a Multi-AZ deployment and automatic switch over from master to stand-by server on another zone.
Use of AWS Systems Manager to install Postgres and TimescaleDB and manage upgrades.
Have a Terraform provider that would allow setting TimescaleDB specific parameters.
Attach the same Elastic IP, in case one wants to expose the same public IP.

Closing Thoughts

Running your own, self-hosted TimescaleDB on AWS is quite possible. The above work demonstrates that. However, it needs more work if this was to become a real production environment.

gm! I am Panos Matsinopoulos, humble software reader, writer, classical music and V8 engine lover. I work for Talent Protocol, where we make builders get the recognition they deserve.

How to Deploy TimescaleDB on AWS with Terraform (Step-by-Step Guide)