sia.hackernoon.com

The most dangerous database design mistake I've seen in startups isn't sloppy schemas or quick-and-dirty hacks. It's senior engineers doing everything "right"—enforcing full normalization, splitting everything into granular tables, and chasing third normal form too early.

I've built and scaled systems across both ends of the spectrum: monolithic enterprise backends in banks and big tech, and greenfield products in early-stage startups. Payment platforms were a recurring theme. In that world, data models shift constantly—new processors, new regulations, new pivots, sometimes on a weekly basis.

Applied without context, "correct" engineering quietly kills the most precious startup resource: velocity.

The problem isn't normalization itself. It's that this approach is built on assumptions startups don't have. Traditional database design works when two conditions are true:

The business domain is stable and well understood
The organization has the infrastructure to manage complex schemas safely

Enterprises have both. They operate with established domain models, DBAs, migration processes, review boards, and maintenance windows. More importantly, they have known context—requirements that change slowly and predictably.

Startups have neither. Your domain is still forming. You don't have organizational slack to coordinate multi-week migrations. What looks like a core entity today might be irrelevant next quarter.

What starts as responsible engineering becomes the biggest brake on progress. This post is the recipe I wish I'd had earlier: how to design databases that evolve as fast as your startup needs to—without descending into chaos.

Mental Model #1: Complex vs. Complicated

The root cause of the normalization trap is a domain mismatch. Enterprises and startups operate in fundamentally different problem spaces, and that difference demands different database strategies.

The Cynefin framework names this distinction precisely:

Complicated Domain (enterprises)

Enterprises operate in what’s called a Complicated domain. This means your requirements are stable and knowable. You can analyze upfront and get it right, and changes happen slowly and predictably.

Example: A bank knows exactly what payment processors it supports, what regulations apply, and what fields are required. These don’t change weekly. When they do change, there are teams to handle migrations safely.

In this context, normalize everything:

Split payments into granular tables
Enforce relationships with foreign keys
Design the “correct” schema upfront

When your domain is stable, normalization gives you data integrity and optimized queries.

Complex Domain (startups)

Startups operate in a Complex domain. This means you don’t know what will matter until you try it. Requirements emerge through experimentation, and what seems core today might be irrelevant tomorrow.

Example: You launch with Stripe, but next month a client needs Adyen. Then PayPal. Then a new regulation changes what you need to store. Then you pivot to B2B, and processor requirements completely change.

You can’t predict this upfront, and you learn by shipping and observing what happens.

Database strategy should be designed for rapid change:

Keep schema flexible
Minimize coordination overhead
Make changes in code, not migrations

When your domain is still forming, rigid normalization locks you into assumptions that might be wrong next month.

THE MENTAL MODEL

Before writing a code, ask: "Is my domain stable or still forming?"

Complicated (stable domain):

- You know your requirements and they won't change much, and -changes are infrequent and planned

Then normalize: Lock in structure and optimize for integrity.

Complex (forming domain):

New requirements appear unpredictably and you learn by shipping and observing

Then stay flexible: Optimize for evolution speed, not schema purity.

Mental Model #2: Aggregate Thinking

In the world of unknowns where domains are complex and unpredictable, stop thinking in normalized entities. Start thinking in domain aggregates. An aggregate is a cluster of related data that changes together as one atomic unit. Collapsing related entities into aggregates eliminates coordination overhead and lets your schema evolve with your domain.

Inspired by Domain-Driven Design (DDD), an aggregate is a group of related objects (entities and value objects) treated as a single unit for data changes. There’s one aggregate root - the entry point - that enforces invariants (business rules) within the cluster.

The example: Loan Applications

Normalization teaches you to split everything into tables. A "properly designed" loan application system looks like this:

loan_applications
├── applicant_personal_info
├── applicant_employment_info
├── applicant_income_sources
├── credit_check_results
├── debt_obligations
├── underwriting_decisions
├── approval_conditions
├── loan_limits
├── collateral_details
├── appraisal_info
├── compliance_checks
└── document_uploads

12 tables for one loan application. The coordination tax:

Display application summary, do JOIN 8 tables
Update underwriting conditions and you need to coordinate changes across 3 tables
Add new loan product and you need ALTER 5 tables, coordinate deployment

But ask yourself: Do you ever update employment info independently of the loan application? Change credit check results outside the application context? Modify loan limits without touching the application?

The answer will be: No. They all change together and they're not independent entities—they're parts of one loan application aggregate.

The Aggregate Approach

The shift: model your domain as aggregates, but represent them as JSON documents within a relational table.

You keep the best from the relational foundation - ACID transactions, relationships between aggregates, query performance on indexed fields. But you store the aggregate's internal structure as a JSON document, giving you schema flexibility without sacrificing transactional guarantees.

CREATE TABLE loan_applications (
  id UUID PRIMARY KEY,
  -- indexes  
  user_id UUID NOT NULL,
  loan_type TEXT NOT NULL,
  status TEXT NOT NULL,
  -- entity itself  
  data JSONB NOT NULL,
  -- technical fields  
  version INTEGER NOT NULL DEFAULT 1,
  schema_version INTEGER NOT NULL  DEFAULT 1, 
  created_at TIMESTAMP NOT NULL,
  updated_at TIMESTAMP NOT NULL
);

1. No Joins

Application summary = one query. No coordination across tables.

2. Atomic Updates

Change underwriting conditions = update one JSONB field, one transaction. Single-row operations mean less contention, faster writes, and reduced deadlock risk in high-concurrency systems.

3. Zero-Migration Evolution

Do new loan products need different fields? Update application code, deploy. No ALTER TABLE, no coordination.. Moreover you can control it at the application layer, have different conversion strategies and make canary rollout as easy as it can be.

But it comes with a price: you should find good JSON frameworks, build scripts for backfills, JSONB extractions, and eventual normalization when parts stabilize

4. Version Control

You should keep schema_version to track evolution. Old applications keep old structures. New applications use new structure. Backfill when convenient, not urgently plus canary deployment becomes possible with any data changes as you want.

5. Audit Trail for Free

For regulatory questions, if you need to see exactly what the application looked like when approved? The entire state is in one place, at one timestamp so the audit trail becomes easy to implement.

6. Optimistic locks

Try implementing optimistic locking across 8 normalized tables. Now try it with one version column on a single aggregate.

Single-row updates with version checks are straightforward. Multi-table coordination is a nightmare.

7. Performance

Don't rely on JSONB indexes (like Postgres GIN) for query performance. Instead, extract high-traffic fields as indexed columns. Yes, this is data duplication, but it's explicit and optimized for your query patterns by design.

THE MENTAL MODEL

Traditional normalization optimizes for query flexibility in stable domains. Aggregate thinking optimizes for evolution speed in Complex domains.

Baseline

It all boils down to context. Enterprises have stable domains and migration infrastructure, normalization is the right choice. Startups have neither, so aggregate thinking with JSONB is the right choice.

The mistake isn't choosing normalization, but applying enterprise practices before you have enterprise conditions.

Save the "proper" normalization for when your domain stabilizes. Until then, optimize for evolution speed.