How OpenAI Scaled to 800 Million Users on PostgreSQL

ChatGPT grew from zero to 100 million users in two months. Today it has over 800 million. That makes it one of the fastest-growing applications in history.

And their database? PostgreSQL. The same 30-year-old relational database running your side project.

Most engineering teams would have panicked at 10 million users and started a multi-year migration to Cassandra, DynamoDB, or some other NoSQL system. OpenAI did not. They scaled PostgreSQL itself. And the techniques they used are not exotic - they are patterns any backend team can apply today.

Here are the 7 strategies, in the order you would likely need them.

Strategy 1: Connection Pooling (The First Thing That Breaks)

The first bottleneck is never query speed. It is connections.

Every PostgreSQL connection costs ~10MB of memory. If you have 1,000 app instances each opening 10 connections, that is 10,000 connections consuming 100GB of RAM - just for connection overhead. PostgreSQL handles hundreds of connections well. Thousands badly. Tens of thousands not at all.

The fix: PgBouncer.

PgBouncer sits between your application and PostgreSQL. Instead of each app instance maintaining its own connections, they all talk to PgBouncer, which maintains a much smaller pool of actual database connections.

Without PgBouncer:  1,000 app instances x 10 connections = 10,000 DB connections
With PgBouncer:     1,000 app instances -> PgBouncer -> 200 DB connections

50x reduction. Here is the key configuration:

[pgbouncer]
pool_mode = transaction          # Assign connection per transaction, not per session
max_client_conn = 10000          # Connections FROM apps TO PgBouncer
default_pool_size = 100          # Connections FROM PgBouncer TO PostgreSQL

Use transaction mode for web applications. It gives the best connection reuse while still supporting transactions.

If you are using Django:

DATABASES = {
    'default': {
        'HOST': 'pgbouncer.internal',  # Point to PgBouncer, not PostgreSQL
        'PORT': '6432',
        'CONN_MAX_AGE': 0,             # Let PgBouncer manage pooling
    }
}

Set CONN_MAX_AGE = 0. Django’s built-in connection pooling conflicts with PgBouncer. Let PgBouncer handle it.

Strategy 2: Read Replicas (Because 80%+ of Your Traffic is Reads)

ChatGPT users read their conversation history far more often than they create new messages. Most applications follow the same pattern - 80-90% reads, 10-20% writes.

Read replicas are copies of your primary database that handle read queries:

Writes -> Primary database
Reads  -> Replica 1, Replica 2, Replica 3 (load balanced)
Primary replicates changes to all replicas

Your application routes traffic:

class DatabaseRouter:
    def __init__(self, primary, replicas):
        self.primary = primary
        self.replicas = replicas

    def read_connection(self):
        return random.choice(self.replicas)

    def write_connection(self):
        return self.primary

The catch: replication lag. Replicas are slightly behind the primary. For most reads, a few milliseconds of lag does not matter. But sometimes you need to read your own writes immediately:

# After updating a user, read from primary for fresh data
update_user(user_id, new_email)
user = get_user(user_id, use_primary=True)  # Force primary read

This read/write separation is a core distributed systems pattern. It is formalized as CQRS (Command Query Responsibility Segregation), which takes the idea further by using separate data models for reads and writes.

Strategy 3: Query Optimization (Fix Queries Before Scaling Infrastructure)

A bad query on 10 shards is still a bad query. Before adding more hardware, fix your SQL.

Use EXPLAIN ANALYZE to find the problem:

EXPLAIN ANALYZE
SELECT m.*, u.name
FROM messages m
JOIN users u ON m.user_id = u.id
WHERE m.created_at > '2026-01-01'
ORDER BY m.created_at DESC
LIMIT 100;

If you see Seq Scan on a large table, you need an index. If you see high rows removed by filter, you are scanning too much data.

Three query patterns that kill performance at scale:

1. Functions on indexed columns:

-- BAD: index on created_at is ignored
WHERE DATE(created_at) = '2026-01-15'

-- GOOD: index on created_at is used
WHERE created_at >= '2026-01-15' AND created_at < '2026-01-16'

2. SELECT * when you only need 3 columns:

-- BAD: fetches all columns
SELECT * FROM users WHERE id = 123;

-- GOOD: only what you need
SELECT id, name, email FROM users WHERE id = 123;

3. N+1 queries (the silent killer):

# BAD: 101 queries
users = db.execute("SELECT * FROM users LIMIT 100")
for user in users:
    messages = db.execute("SELECT * FROM messages WHERE user_id = %s", [user.id])

# GOOD: 1 query
results = db.execute("""
    SELECT u.*, m.*
    FROM users u
    LEFT JOIN messages m ON m.user_id = u.id
    WHERE u.id IN (SELECT id FROM users LIMIT 100)
""")

Index strategy matters too. Composite indexes must follow the leftmost prefix rule - a detail I see developers get wrong constantly. The column order in CREATE INDEX idx ON orders(customer_id, order_date) is not interchangeable. I wrote a full guide on this: How Database Indexing Works.

Strategy 4: Caching (The Fastest Query is One You Don’t Make)

Not every request needs to hit the database. OpenAI uses Redis to cache frequently accessed data.

def get_user(user_id):
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    user = db.execute("SELECT * FROM users WHERE id = %s", [user_id])
    redis.setex(f"user:{user_id}", 300, json.dumps(user))  # Cache 5 min
    return user

The pattern is simple: check cache first, fall back to database, populate cache on miss. But the devil is in the details - cache invalidation, choosing TTLs, deciding between cache-aside vs read-through vs write-through patterns.

There is no single right answer. The choice depends on your read/write ratio and consistency requirements.

Strategy 5: Horizontal Sharding (When One Instance Is Not Enough)

When a single PostgreSQL instance cannot handle your data or traffic, you split data across multiple instances.

Shard Router
  |-- Shard 1: Users 1-1M
  |-- Shard 2: Users 1M-2M
  |-- Shard 3: Users 2M-3M
  |-- Shard N: Users 3M+

Each shard is a complete PostgreSQL instance with the same schema. A shard router determines which shard holds which data:

class ShardRouter:
    def __init__(self, shard_connections):
        self.shards = shard_connections
        self.num_shards = len(shard_connections)

    def get_shard(self, user_id):
        return hash(str(user_id)) % self.num_shards

    def query(self, user_id, sql, params):
        shard_id = self.get_shard(user_id)
        return self.shards[shard_id].execute(sql, params)

Choosing the right shard key is critical:

Good shard keys	Bad shard keys
user_id (even distribution)	created_at (hot spots on recent data)
tenant_id	country (uneven distribution)
organization_id	status (low cardinality)

The tooling has matured significantly. You do not have to build sharding from scratch:

Citus - PostgreSQL extension for transparent sharding (most popular for Postgres)
Vitess - middleware proxy used by YouTube and Slack
pgcat - lightweight sharding with connection pooling built-in

My advice: Start with application-level sharding to understand your access patterns. Move to Citus when managing shards manually becomes painful.

Strategy 6: Connection Management at Scale

At massive scale, connection management becomes its own discipline:

-- Kill queries after 30 seconds
SET statement_timeout = '30s';

-- Kill idle transactions after 60 seconds
SET idle_in_transaction_session_timeout = '60s';

And monitor ruthlessly:

-- Find long-running queries
SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
  AND now() - query_start > interval '1 minute';

Strategy 7: Monitoring Everything

You cannot fix what you cannot see.

Essential metrics:

Metric	Why it matters
Connection count	Approaching limits causes cascading failures
Query latency (p50, p95, p99)	Direct user experience impact
Replication lag	Data freshness on replicas
Cache hit rate	Database load indicator
Slow query count	Performance regressions

Enable pg_stat_statements to track query performance across your entire workload:

CREATE EXTENSION pg_stat_statements;

-- Top 10 queries by total execution time
SELECT query, calls, total_exec_time / 1000 as total_seconds, mean_exec_time as avg_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

This is how you find the 3 queries responsible for 80% of your database load.

The Complete Architecture

Here is how all 7 strategies fit together:

Application Servers (1,000+)
        |
        v
    Redis Cache (check first, avoid DB when possible)
        |
        v (cache miss)
    PgBouncer (10,000 app connections -> 200 DB connections)
        |
   Shard Router (route by user_id)
      /    \
  Shard 1    Shard 2
  Primary    Primary
   /   \      /   \
  R1    R2   R1    R2     (read replicas per shard)

Request flow:

App checks Redis cache
Cache miss goes to PgBouncer
PgBouncer routes to correct shard
Reads go to replicas, writes go to primary
Result is cached in Redis for next time

The Order Matters

This is the sequence most teams should follow:

Connection pooling - easiest win, biggest immediate impact
Query optimization - fix the SQL before throwing hardware at it
Read replicas - handle the 80%+ read traffic
Caching - reduce database load for hot data
Horizontal sharding - only when a single instance is genuinely not enough
Connection management - timeouts, limits, and hygiene
Monitoring - should actually be #1, but most teams add it later

The key insight: PostgreSQL scales further than most people think. The database is rarely the problem. Bad queries, missing indexes, no connection pooling, no caching - those are the problems. Fix them and PostgreSQL will carry you to hundreds of millions of users.

For more engineering deep dives on how real systems work at scale, visit singhajit.com.