Introduction:
If you’ve wired up PostgreSQL → Debezium → Kafka and started consuming change events, you may have seen something weird in your payloads:
{
"after": {
"id": 123,
"title": "__debezium_unavailable_value",
"body": "__debezium_unavailable_value"
}
}
You know those title and body columns have data in the database, but Debezium is emitting __debezium_unavailable_value instead.
If you’re trying to:
- Keep a search index (e.g., Elasticsearch) in sync,
- Build an incremental aggregation (e.g., materialized view replacement),
- Or replicate rows to another store,
this placeholder can silently corrupt your downstream state.
This post explains:
- Why
__debezium_unavailable_valuehappens - How PostgreSQL’s TOAST and replica identity interact with Debezium
- Two practical ways to fix it, with concrete SQL and consumer code
The Problem: CDC Events Missing Previous Values
Consider a table:
CREATE TABLE articles (
id BIGSERIAL PRIMARY KEY,
title TEXT,
body TEXT,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
You configure Debezium’s PostgreSQL connector to stream changes from articles into Kafka. For an UPDATE, you expect Debezium to send the new row state, including title and body.
Instead, you see events like:
{
"op": "u",
"after": {
"id": 123,
"title": "__debezium_unavailable_value",
"body": "__debezium_unavailable_value",
"updated_at": "2025-01-01T10:00:00Z"
},
"before": {
"id": 123,
"title": "__debezium_unavailable_value",
"body": "__debezium_unavailable_value",
"updated_at": "2024-12-31T15:00:00Z"
}
}
Downstream, your consumer:
- Overwrites the previous title/body with
"__debezium_unavailable_value", or - Crashes because it doesn’t expect that pseudo‑value.
What’s going on?
What’s Really Happening: TOAST and Replica Identity
Two PostgreSQL features are colliding here:
1. TOAST (The Oversized-Attribute Storage Technique)
PostgreSQL stores large column values (e.g., big TEXT or JSONB) in a separate “TOAST” table to avoid blowing up the main page size.
Key behavior:
- When a row is updated, if a large column doesn’t change, PostgreSQL might not pull the old value from the TOAST table.
- For logical decoding (which Debezium uses), unchanged TOASTed columns may be omitted from the “before/after” images.
Debezium represents those missing values as: __debezium_unavailable_value because it cannot reconstruct the actual content from the WAL alone.
2. Replica Identity
PostgreSQL controls how much “OLD” data is logged for replication via REPLICA IDENTITY.
By default:
ALTER TABLE articles REPLICA IDENTITY DEFAULT;
means:
- Only primary key columns are guaranteed in the “before” image for updates/deletes.
- Non‑PK columns (especially TOASTed ones) may be missing.
Solution 1: Adjust REPLICA IDENTITY (Full or Index-Based)
If your use case really needs the previous values of specific columns, you can tell PostgreSQL to log more information.
Option A: REPLICA IDENTITY FULL
This logs the “before” image for all columns, including TOASTed ones.
ALTER TABLE public.articles
REPLICA IDENTITY FULL;
Pros:
- Debezium will see actual values instead of
__debezium_unavailable_value. - Your CDC events will always contain a complete picture of the row.
Cons:
- Increases WAL size and write overhead on the source database.
- Not ideal if you have many large columns and heavy write traffic.
Option B: REPLICA IDENTITY USING INDEX
If you only care about some columns (e.g., title and body but not other big JSON fields), create a dedicated index and use it as replica identity:
-- 1) Create an index on the columns you need in the "before" image
CREATE UNIQUE INDEX articles_replica_identity_idx
ON public.articles (id, title, body);
-- 2) Use that index for replica identity
ALTER TABLE public.articles
REPLICA IDENTITY USING INDEX articles_replica_identity_idx;
Now, for updates/deletes:
- PostgreSQL logs
id,title,bodyin the “before” image. - Debezium can read the actual values, not placeholders.
This is a good compromise:
- More WAL than
DEFAULT, but less thanFULL - Targets only the columns your downstream systems really need
Important: changing
REPLICA IDENTITYhas production DB impact. Test on a staging cluster and monitor WAL size / replication lag.
Solution 2: Handle __debezium_unavailable_value in Consumers
Sometimes you don’t actually need the unchanged TOASTed values for certain use cases. You just need to avoid overwriting good data with a placeholder.
In those cases, you can handle this in your CDC consumer.
Example: Python Kafka Consumer (Confluent Kafka)
Let’s say your sink DB already has the correct previous values for title and body, and you:
- Only care about new values when they’re provided
- Want to ignore placeholders for unchanged fields
Simplified Python consumer:
from confluent_kafka import Consumer
import json
UNAVAILABLE = "__debezium_unavailable_value"
consumer = Consumer({
"bootstrap.servers": "kafka:9092",
"group.id": "articles-sync",
"auto.offset.reset": "earliest",
})
consumer.subscribe(["cdc.public.articles"])
def merge_with_existing(existing_row, after_payload):
"""
Merge CDC 'after' payload into existing row, ignoring unavailable fields.
"""
merged = dict(existing_row)
for col, val in after_payload.items():
if val == UNAVAILABLE:
# Keep existing value, do not overwrite
continue
merged[col] = val
return merged
while True:
msg = consumer.poll(1.0)
if msg is None:
continue
if msg.error():
print("Consumer error:", msg.error())
continue
event = json.loads(msg.value())
op = event.get("op")
after = event.get("after")
key = event.get("after", {}).get("id")
if op == "c": # insert
# write full after into sink
upsert_into_sink(after)
elif op == "u": # update
existing = read_from_sink(key)
merged = merge_with_existing(existing, after)
upsert_into_sink(merged)
elif op == "d": # delete
delete_from_sink(key)
Key idea:
- Treat
__debezium_unavailable_valueas “unchanged” and don’t overwrite sink values with it.
You can do similar logic in Java, Go, or wherever your consumer runs.
This approach is safe only if you know your sink always has the last good value. If you have consumers that start from an empty state or may miss events, you’ll need a reliable backfill/snapshot mechanism as well.
When to Use Which Approach?
Use REPLICA IDENTITY tweaks when:
- You truly need before values of certain columns (e.g., to compute deltas, decrement aggregates, or build exact audit logs).
- Your production DB can tolerate some WAL growth.
Use consumer‑side handling when:
- You don’t need the old values to compute correct results.
- You just want to prevent corrupting the sink with the placeholder.
- You want zero impact on the source DB.
In many real systems, you end up using both:
- Increase
REPLICA IDENTITYonly for a few critical tables/columns. - Keep consumer logic robust against placeholders in less critical streams.
Takeaways
If you see __debezium_unavailable_value in your Debezium CDC stream, it’s not a bug; it’s PostgreSQL and Debezium being honest about what they don’t know.
To fix it:
- Understand TOAST and REPLICA IDENTITY.
- For tables where previous values matter, change
REPLICA IDENTITY(FULLorUSING INDEX) so Debezium can see what you need. - For other tables, make your consumers ignore placeholders instead of overwriting valid data.
Do that, and your CDC pipelines become a lot more trustworthy and your downstream systems won’t be haunted by __debezium_unavailable_value ever again.