You've added AI to your app. The demo works beautifully. Your JSON comes back perfectly structured, and you ship it to production feeling like a genius. Then users start complaining.
"The chart is empty."
"The timeline is all scrambled."
"It just says 'TBD' everywhere."
Welcome to the reality of AI-generated JSON in production. It breaks. A lot. The research backs this up: LLMs achieve a roughly 82% success rate for JSON generation across diverse tasks. That means nearly 1 in 5 requests returns something your app can't use. I've been building SlideMaker app, an AI presentation generator that creates slides with charts, timelines, funnels, and 30 different content types. After 50,000+ generations running at 500-600 per day, I've learned exactly where AI JSON breaks—and how to catch it before users do.
Here's the validation stack that got reliability above 95%.
The Problem is Worse Than You Think
JSON Mode ≠ Schema Compliance
First, let's clear up a dangerous misconception. When you enable "JSON mode" on GPT-4, Gemini, or Claude, you're getting a guarantee that the output will be syntactically valid JSON. Brackets will match. Quotes will be escaped. It will parse.
That's it.
JSON mode does NOT guarantee:
- Your required fields exist
- Values are the correct type
- Enums contain valid options
- The data makes logical sense
This distinction kills production apps. The JSON parses fine, so no error is thrown. But your frontend receives an object missing half the fields it needs, and things silently break.
The Four Ways AI JSON Actually Breaks
After analyzing thousands of failed generations, I've categorized them into four buckets:
1. Missing Fields
You asked for title, body, and image_keywords. You got title and body. No error, just a missing field that crashes your image loader.
2. Wrong Types
Your schema expects data: [10, 20, 30]. The model returns data: "10, 20, 30". Valid JSON. Broken chart.
3. Invalid Enum Values
You specified chart_type must be one of: bar, line, pie. The model decides horizontal_bar sounds better. Your chart library doesn't
agree.
4. Semantic Nonsense
This is the sneaky one. The JSON is structurally perfect. Every field exists. Every type is correct. But the meaning is wrong.
Real example from production: A funnel diagram where the values go UP instead of down.
{
"type": "funnel",
"stages": [
{"label": "Visitors", "value": 100},
{"label": "Leads", "value": 250},
{"label": "Customers", "value": 500}
]
}
Syntactically perfect. Semantically absurd. Funnels go DOWN. That's why they're called funnels.
The model knows what a funnel is. It just doesn't always care.
Layer 1: Stop Trusting the Model
The first rule of AI JSON: validate everything.
Don't assume the model followed instructions. Check every field you need actually exists.
def validate_required_fields(slide, required_fields):
errors = []
for field in required_fields:
if field not in slide or not slide[field]:
errors.append(f"Missing required field: {field}")
return errors
Simple? Yes. Essential? Absolutely.
Type-Specific Validation
Different output types need different validation rules. A chart needs different fields than a timeline.
VALIDATION_RULES = {
"chart": {
"required_fields": ["title", "chart_type", "chart_data", "body"],
"min_data_points": 3
},
"timeline": {
"required_fields": ["title", "diagram_data"],
"min_events": 4,
"max_events": 6
},
"funnel": {
"required_fields": ["title", "diagram_data"],
"min_stages": 3,
"max_stages": 5
}
}
Then create a validator registry:
VALIDATORS = { "chart": ChartValidator,
"timeline": TimelineValidator,
"funnel": FunnelValidator,
"bullet_points": BulletValidator,
}
def validate_slide(slide):
slide_type = slide.get("type", "bullet_points")
validator = VALIDATORS.get(slide_type, DefaultValidator)()
return validator.validate(slide)
Each validator knows exactly what its type needs. No guessing.
Layer 2: Semantic Validation
Here's where most validation systems stop—and where most production bugs hide. Fields exist. Types are correct. But is the data correct?
Real Semantic Rules That Catch Real Bugs
Content
Type | Validation Rule | Why It Matters |
Funnel | Values must decrease | It's a FUNNEL |
Timeline | Events must be chronological | It's a TIMELINE |
Chart | Label count must match data count | Or the chart breaks |
Bullet Points | Each bullet needs title + body | Or it renders empty |
Funnel Validation Example
def validate_funnel(slide):
stages = slide.get("diagram_data", {}).get("stages", [])
# Check minimum stages
if len(stages) < 3:
return False, "Funnel requires at least 3 stages"
# Check values decrease (this is the semantic part)
values = [stage.get("value", 0) for stage in stages]
for i in range(1, len(values)):
if values[i] >= values[i-1]:
return False, "Funnel values must decrease at each stage"
# Check for flat funnels (all same value)
if len(set(values)) == 1:
return False, "All stages have identical values"
return True, None
Timeline Validation Example
def validate_timeline(slide):
events = slide.get("diagram_data", {}).get("events", [])
if len(events) < 4:
return False, "Timeline needs at least 4 events"
# Check chronological order
years = []
for event in events:
year_str = event.get("year", "")
# Extract numeric year (handles "2020", "Q1 2020", "Jan 2020")
year = extract_year(year_str)
if year:
years.append(year)
if years != sorted(years):
return False, "Timeline events must be in chronological order"
return True, None
These checks catch the "perfect JSON, broken output" problem that JSON mode completely misses.
Layer 3: Catch the Lazy Model
Sometimes the model gets lazy. Instead of generating real content, it outputs placeholders.
{
"title": "Key Benefits",
"bullet_points": [
{"title": "Benefit 1", "body": "Details to be added..."},
{"title": "Benefit 2", "body": "TBD"},
{"title": "Benefit 3", "body": "..."}
]
}
Structurally valid. Completely useless.
The Forbidden Content List
FORBIDDEN_CONTENT = [
"tbd",
"todo",
"placeholder",
"...",
"xxx",
"fill in",
"to be determined",
"coming soon",
"insert here",
"details about",
"information about",
"content about",
"lorem ipsum"
]
def check_forbidden_content(text):
text_lower = text.lower()
for forbidden in FORBIDDEN_CONTENT:
if forbidden in text_lower:
return False, f"Contains placeholder content: '{forbidden}'"
return True, None
Scan All Text Fields
Don't just check the title. Check everything:
def
validate_content_quality(slide):
errors = []
# Check title
title = slide.get("title", "")
valid, error = check_forbidden_content(title)
if not valid:
errors.append(f"Title: {error}")
# Check body
body = slide.get("body", "")
if body:
valid, error = check_forbidden_content(body)
if not valid:
errors.append(f"Body: {error}")
# Check bullet points
for i, bullet in enumerate(slide.get("bullet_points", [])):
bullet_body = bullet.get("body", "")
valid, error = check_forbidden_content(bullet_body)
if not valid:
errors.append(f"Bullet {i+1}: {error}")
return len(errors) == 0, errors
Layer 4: The Retry Loop That Actually Works
Here's the key insight: retrying with the same prompt gives the same failure. The model made a mistake for a reason. Maybe the schema wasn't clear. Maybe it prioritized brevity over completeness. Whatever the cause, blindly retrying won't fix it.
Error Context Injection
When validation fails, tell the model exactly what went wrong:
def generate_with_retry(prompt, max_retries=2):
for attempt in range(max_retries + 1):
response = call_llm(prompt)
try:
data = json.loads(response)
is_valid, errors = validate_slide(data)
if is_valid:
return data
# Build retry prompt with specific errors
if attempt < max_retries:
error_context = "\n".join(f"- {e}" for e in errors)
prompt = f"""
Previous attempt failed validation:
{error_context}
Please fix these specific issues and regenerate.
Original request:
{prompt}
"""
except json.JSONDecodeError as e:
if attempt < max_retries:
prompt = f"""
Previous response was not valid JSON.
Error: {str(e)}
Please return ONLY valid JSON with no markdown or extra text.
Original request:
{prompt}
"""
# All retries exhausted
return None
Why This Works
The retry prompt now contains:
1. The specific validation errors
2. A direct instruction to fix those issues
3. The original request for context
This transforms a ~85% first-attempt success rate into 95%+ after retry.
Set a Retry Budget
Don't retry forever:
MAX_RETRIES = 2 # Total of 3 attempts
def should_retry(attempt, error_type):
if attempt >= MAX_RETRIES:
return False
# Some errors aren't worth retrying
if error_type == "rate_limit":
return True # Wait and retry
if error_type == "context_too_long":
return False # Need to reduce input, not retry
return True
After exhausting retries, fall back gracefully:
• Use a simpler output type
• Return a partial result with error indication
• Log for manual review
The retry mechanism isn't error handling. It's part of the system.
Bonus: The Constraint Family Trick
There's one more problem that validation alone doesn't solve: repetitive output.
Ask for a 10-slide presentation and sometimes you get 5 comparison slides. Each one is valid JSON. Each one passes validation. But the presentation is boring and repetitive.
Constraint Families
Group similar content types into families. Allow only ONE from each family per generation:
CONSTRAINT_FAMILIES = {
"comparison": ["comparison_split", "before_after", "pros_cons"],
"highlight": ["big_number", "stat_highlight", "quote"],
"steps": ["numbered_steps", "process_flow"],
"data": ["chart", "table"]
}
def check_family_constraints(slides):
used_families = {}
for slide in slides:
slide_type = slide.get("type")
for family, types in CONSTRAINT_FAMILIES.items():
if slide_type in types:
if family in used_families:
return False, f"Multiple {family} types: {used_families[family]} and {slide_type}"
used_families[family] = slide_type
return True, None
This ensures variety without manual intervention.
The Complete Validation Stack
Here's everything together:
def validate_ai_output(slide, verbosity="balanced"):
"""
Complete validation pipeline for AI-generated JSON.
Returns (is_valid, error_list)
"""
errors = []
# Layer 1: Required fields
slide_type = slide.get("type", "unknown")
rules = VALIDATION_RULES.get(slide_type, {})
for field in rules.get("required_fields", []):
if field not in slide or not slide[field]:
errors.append(f"Missing: {field}")
# Layer 2: Type-specific semantic validation
validator = VALIDATORS.get(slide_type)
if validator:
valid, semantic_errors = validator().validate(slide)
if not valid:
errors.extend(semantic_errors)
# Layer 3: Content quality (forbidden patterns)
valid, quality_errors = validate_content_quality(slide)
if not valid:
errors.extend(quality_errors)
return len(errors) == 0, errors
Use it in your generation
def generate_slide(topic, slide_type):
prompt = build_prompt(topic, slide_type)
result = generate_with_retry(prompt, max_retries=2)
if result is None:
# Log failure, use fallback
log_generation_failure(topic, slide_type)
return create_fallback_slide(topic)
return result
The Mindset Shift
Stop thinking of validation as error handling. It's not.
Validation is part of the product.
The model is a probabilistic system. It will make mistakes. Your validation layer transforms that probabilistic mess into deterministic, reliable output.
JSON mode is step 1. Not the solution.
After implementing this stack across 50,000+ generations:
• First attempt success: ~85-90%
• After retry with error context: 95%+
• User-visible failures: <2%
That's the difference between a demo and a product.
Quick Reference: Validation Checklist
Before you ship AI JSON to production:
- Required field validation for each output type
- Type-specific validators with semantic rules
- Forbidden content scanning
- Smart retry with error context injection
- Retry budget (don't loop forever)
- Graceful fallback when retries exhausted
- Constraint families for output variety
Don't trust the model. Verify the model.
Building something with AI-generated structured data? I'd love to hear what validation challenges you've hit.