ChatGPT grew from zero to 100 million users in two months. Today it has over 800 million. That makes it one of the fastest-growing applications in history.
And their database? PostgreSQL. The same 30-year-old relational database running your side project.
Most engineering teams would have panicked at 10 million users and started a multi-year migration to Cassandra, DynamoDB, or some other NoSQL system. OpenAI did not. They scaled PostgreSQL itself. And the techniques they used are not exotic - they are patterns any backend team can apply today.
Here are the 7 strategies, in the order you would likely need them.
Strategy 1: Connection Pooling (The First Thing That Breaks)
The first bottleneck is never query speed. It is connections.
Every PostgreSQL connection costs ~10MB of memory. If you have 1,000 app instances each opening 10 connections, that is 10,000 connections consuming 100GB of RAM - just for connection overhead. PostgreSQL handles hundreds of connections well. Thousands badly. Tens of thousands not at all.
The fix: PgBouncer.
PgBouncer sits between your application and PostgreSQL. Instead of each app instance maintaining its own connections, they all talk to PgBouncer, which maintains a much smaller pool of actual database connections.
Without PgBouncer: 1,000 app instances x 10 connections = 10,000 DB connections
With PgBouncer: 1,000 app instances -> PgBouncer -> 200 DB connections
50x reduction. Here is the key configuration:
[pgbouncer]
pool_mode = transaction # Assign connection per transaction, not per session
max_client_conn = 10000 # Connections FROM apps TO PgBouncer
default_pool_size = 100 # Connections FROM PgBouncer TO PostgreSQL
Use transaction mode for web applications. It gives the best connection reuse while still supporting transactions.
If you are using Django:
DATABASES = {
'default': {
'HOST': 'pgbouncer.internal', # Point to PgBouncer, not PostgreSQL
'PORT': '6432',
'CONN_MAX_AGE': 0, # Let PgBouncer manage pooling
}
}
Set CONN_MAX_AGE = 0. Django’s built-in connection pooling conflicts with PgBouncer. Let PgBouncer handle it.
Strategy 2: Read Replicas (Because 80%+ of Your Traffic is Reads)
ChatGPT users read their conversation history far more often than they create new messages. Most applications follow the same pattern - 80-90% reads, 10-20% writes.
Read replicas are copies of your primary database that handle read queries:
Writes -> Primary database
Reads -> Replica 1, Replica 2, Replica 3 (load balanced)
Primary replicates changes to all replicas
Your application routes traffic:
class DatabaseRouter:
def __init__(self, primary, replicas):
self.primary = primary
self.replicas = replicas
def read_connection(self):
return random.choice(self.replicas)
def write_connection(self):
return self.primary
The catch: replication lag. Replicas are slightly behind the primary. For most reads, a few milliseconds of lag does not matter. But sometimes you need to read your own writes immediately:
# After updating a user, read from primary for fresh data
update_user(user_id, new_email)
user = get_user(user_id, use_primary=True) # Force primary read
This read/write separation is a core distributed systems pattern. It is formalized as CQRS (Command Query Responsibility Segregation), which takes the idea further by using separate data models for reads and writes.
Strategy 3: Query Optimization (Fix Queries Before Scaling Infrastructure)
A bad query on 10 shards is still a bad query. Before adding more hardware, fix your SQL.
Use EXPLAIN ANALYZE to find the problem:
EXPLAIN ANALYZE
SELECT m.*, u.name
FROM messages m
JOIN users u ON m.user_id = u.id
WHERE m.created_at > '2026-01-01'
ORDER BY m.created_at DESC
LIMIT 100;
If you see Seq Scan on a large table, you need an index. If you see high rows removed by filter, you are scanning too much data.
Three query patterns that kill performance at scale:
1. Functions on indexed columns:
-- BAD: index on created_at is ignored
WHERE DATE(created_at) = '2026-01-15'
-- GOOD: index on created_at is used
WHERE created_at >= '2026-01-15' AND created_at < '2026-01-16'
2. SELECT * when you only need 3 columns:
-- BAD: fetches all columns
SELECT * FROM users WHERE id = 123;
-- GOOD: only what you need
SELECT id, name, email FROM users WHERE id = 123;
3. N+1 queries (the silent killer):
# BAD: 101 queries
users = db.execute("SELECT * FROM users LIMIT 100")
for user in users:
messages = db.execute("SELECT * FROM messages WHERE user_id = %s", [user.id])
# GOOD: 1 query
results = db.execute("""
SELECT u.*, m.*
FROM users u
LEFT JOIN messages m ON m.user_id = u.id
WHERE u.id IN (SELECT id FROM users LIMIT 100)
""")
Index strategy matters too. Composite indexes must follow the leftmost prefix rule - a detail I see developers get wrong constantly. The column order in CREATE INDEX idx ON orders(customer_id, order_date) is not interchangeable. I wrote a full guide on this: How Database Indexing Works.
Strategy 4: Caching (The Fastest Query is One You Don’t Make)
Not every request needs to hit the database. OpenAI uses Redis to cache frequently accessed data.
def get_user(user_id):
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
user = db.execute("SELECT * FROM users WHERE id = %s", [user_id])
redis.setex(f"user:{user_id}", 300, json.dumps(user)) # Cache 5 min
return user
The pattern is simple: check cache first, fall back to database, populate cache on miss. But the devil is in the details - cache invalidation, choosing TTLs, deciding between cache-aside vs read-through vs write-through patterns.
There is no single right answer. The choice depends on your read/write ratio and consistency requirements.
Strategy 5: Horizontal Sharding (When One Instance Is Not Enough)
When a single PostgreSQL instance cannot handle your data or traffic, you split data across multiple instances.
Shard Router
|-- Shard 1: Users 1-1M
|-- Shard 2: Users 1M-2M
|-- Shard 3: Users 2M-3M
|-- Shard N: Users 3M+
Each shard is a complete PostgreSQL instance with the same schema. A shard router determines which shard holds which data:
class ShardRouter:
def __init__(self, shard_connections):
self.shards = shard_connections
self.num_shards = len(shard_connections)
def get_shard(self, user_id):
return hash(str(user_id)) % self.num_shards
def query(self, user_id, sql, params):
shard_id = self.get_shard(user_id)
return self.shards[shard_id].execute(sql, params)
Choosing the right shard key is critical:
|
Good shard keys |
Bad shard keys |
|---|---|
|
user_id (even distribution) |
created_at (hot spots on recent data) |
|
tenant_id |
country (uneven distribution) |
|
organization_id |
status (low cardinality) |
The tooling has matured significantly. You do not have to build sharding from scratch:
- Citus - PostgreSQL extension for transparent sharding (most popular for Postgres)
- Vitess - middleware proxy used by YouTube and Slack
- pgcat - lightweight sharding with connection pooling built-in
My advice: Start with application-level sharding to understand your access patterns. Move to Citus when managing shards manually becomes painful.
Strategy 6: Connection Management at Scale
At massive scale, connection management becomes its own discipline:
-- Kill queries after 30 seconds
SET statement_timeout = '30s';
-- Kill idle transactions after 60 seconds
SET idle_in_transaction_session_timeout = '60s';
And monitor ruthlessly:
-- Find long-running queries
SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
AND now() - query_start > interval '1 minute';
Strategy 7: Monitoring Everything
You cannot fix what you cannot see.
Essential metrics:
|
Metric |
Why it matters |
|---|---|
|
Connection count |
Approaching limits causes cascading failures |
|
Query latency (p50, p95, p99) |
Direct user experience impact |
|
Replication lag |
Data freshness on replicas |
|
Cache hit rate |
Database load indicator |
|
Slow query count |
Performance regressions |
Enable pg_stat_statements to track query performance across your entire workload:
CREATE EXTENSION pg_stat_statements;
-- Top 10 queries by total execution time
SELECT query, calls, total_exec_time / 1000 as total_seconds, mean_exec_time as avg_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
This is how you find the 3 queries responsible for 80% of your database load.
The Complete Architecture
Here is how all 7 strategies fit together:
Application Servers (1,000+)
|
v
Redis Cache (check first, avoid DB when possible)
|
v (cache miss)
PgBouncer (10,000 app connections -> 200 DB connections)
|
Shard Router (route by user_id)
/ \
Shard 1 Shard 2
Primary Primary
/ \ / \
R1 R2 R1 R2 (read replicas per shard)
Request flow:
- App checks Redis cache
- Cache miss goes to PgBouncer
- PgBouncer routes to correct shard
- Reads go to replicas, writes go to primary
- Result is cached in Redis for next time
The Order Matters
This is the sequence most teams should follow:
- Connection pooling - easiest win, biggest immediate impact
- Query optimization - fix the SQL before throwing hardware at it
- Read replicas - handle the 80%+ read traffic
- Caching - reduce database load for hot data
- Horizontal sharding - only when a single instance is genuinely not enough
- Connection management - timeouts, limits, and hygiene
- Monitoring - should actually be #1, but most teams add it later
The key insight: PostgreSQL scales further than most people think. The database is rarely the problem. Bad queries, missing indexes, no connection pooling, no caching - those are the problems. Fix them and PostgreSQL will carry you to hundreds of millions of users.
For more engineering deep dives on how real systems work at scale, visit singhajit.com.