Introduction:

If you’ve wired up PostgreSQL → Debezium → Kafka and started consuming change events, you may have seen something weird in your payloads:

{
  "after": {
    "id": 123,
    "title": "__debezium_unavailable_value",
    "body": "__debezium_unavailable_value"
  }
}

You know those title and body columns have data in the database, but Debezium is emitting __debezium_unavailable_value instead.

If you’re trying to:

this placeholder can silently corrupt your downstream state.

This post explains:

The Problem: CDC Events Missing Previous Values

Consider a table:

CREATE TABLE articles (
  id       BIGSERIAL PRIMARY KEY,
  title    TEXT,
  body     TEXT,
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

You configure Debezium’s PostgreSQL connector to stream changes from articles into Kafka. For an UPDATE, you expect Debezium to send the new row state, including title and body.

Instead, you see events like:

{
  "op": "u",
  "after": {
    "id": 123,
    "title": "__debezium_unavailable_value",
    "body": "__debezium_unavailable_value",
    "updated_at":  "2025-01-01T10:00:00Z"
  },
  "before": {
    "id": 123,
    "title": "__debezium_unavailable_value",
    "body": "__debezium_unavailable_value",
    "updated_at": "2024-12-31T15:00:00Z"
  }
}

Downstream, your consumer:

What’s going on?

What’s Really Happening: TOAST and Replica Identity

Two PostgreSQL features are colliding here:

1. TOAST (The Oversized-Attribute Storage Technique)

PostgreSQL stores large column values (e.g., big TEXT or JSONB) in a separate “TOAST” table to avoid blowing up the main page size.

Key behavior:

Debezium represents those missing values as: __debezium_unavailable_value because it cannot reconstruct the actual content from the WAL alone.

2. Replica Identity

PostgreSQL controls how much “OLD” data is logged for replication via REPLICA IDENTITY.

By default:

ALTER TABLE articles REPLICA IDENTITY DEFAULT;

means:

Solution 1: Adjust REPLICA IDENTITY (Full or Index-Based)

If your use case really needs the previous values of specific columns, you can tell PostgreSQL to log more information.

Option A: REPLICA IDENTITY FULL

This logs the “before” image for all columns, including TOASTed ones.

ALTER TABLE public.articles
  REPLICA IDENTITY FULL;

Pros:

Cons:

Option B: REPLICA IDENTITY USING INDEX

If you only care about some columns (e.g., title and body but not other big JSON fields), create a dedicated index and use it as replica identity:

-- 1) Create an index on the columns you need in the "before" image
CREATE UNIQUE INDEX articles_replica_identity_idx
  ON public.articles (id, title, body);

-- 2) Use that index for replica identity
ALTER TABLE public.articles
  REPLICA IDENTITY USING INDEX articles_replica_identity_idx;

Now, for updates/deletes:

This is a good compromise:

Important: changing REPLICA IDENTITY has production DB impact. Test on a staging cluster and monitor WAL size / replication lag.


Solution 2: Handle __debezium_unavailable_value in Consumers

Sometimes you don’t actually need the unchanged TOASTed values for certain use cases. You just need to avoid overwriting good data with a placeholder.

In those cases, you can handle this in your CDC consumer.

Example: Python Kafka Consumer (Confluent Kafka)

Let’s say your sink DB already has the correct previous values for title and body, and you:

Simplified Python consumer:

from confluent_kafka import Consumer
import json

UNAVAILABLE = "__debezium_unavailable_value"

consumer = Consumer({
    "bootstrap.servers": "kafka:9092",
    "group.id": "articles-sync",
    "auto.offset.reset": "earliest",
})

consumer.subscribe(["cdc.public.articles"])

def merge_with_existing(existing_row, after_payload):
    """
    Merge CDC 'after' payload into existing row, ignoring unavailable fields.
    """
    merged = dict(existing_row)
    for col, val in after_payload.items():
        if val == UNAVAILABLE:
            # Keep existing value, do not overwrite
            continue
        merged[col] = val
    return merged

while True:
    msg = consumer.poll(1.0)
    if msg is None:
        continue
    if msg.error():
        print("Consumer error:", msg.error())
        continue

    event = json.loads(msg.value())
    op = event.get("op")
    after = event.get("after")
    key = event.get("after", {}).get("id")

    if op == "c":  # insert
        # write full after into sink
        upsert_into_sink(after)
    elif op == "u":  # update
        existing = read_from_sink(key)
        merged = merge_with_existing(existing, after)
        upsert_into_sink(merged)
    elif op == "d":  # delete
        delete_from_sink(key)

Key idea:

You can do similar logic in Java, Go, or wherever your consumer runs.

This approach is safe only if you know your sink always has the last good value. If you have consumers that start from an empty state or may miss events, you’ll need a reliable backfill/snapshot mechanism as well.

When to Use Which Approach?

Use REPLICA IDENTITY tweaks when:

Use consumer‑side handling when:

In many real systems, you end up using both:

Takeaways

If you see __debezium_unavailable_value in your Debezium CDC stream, it’s not a bug; it’s PostgreSQL and Debezium being honest about what they don’t know.

To fix it:

  1. Understand TOAST and REPLICA IDENTITY.
  2. For tables where previous values matter, change REPLICA IDENTITY (FULL or USING INDEX) so Debezium can see what you need.
  3. For other tables, make your consumers ignore placeholders instead of overwriting valid data.

Do that, and your CDC pipelines become a lot more trustworthy and your downstream systems won’t be haunted by __debezium_unavailable_value ever again.