You've added AI to your app. The demo works beautifully. Your JSON comes back perfectly structured, and you ship it to production feeling like a genius. Then users start complaining.


"The chart is empty."

"The timeline is all scrambled."

"It just says 'TBD' everywhere."


Welcome to the reality of AI-generated JSON in production. It breaks. A lot. The research backs this up: LLMs achieve a roughly 82% success rate for JSON generation across diverse tasks. That means nearly 1 in 5 requests returns something your app can't use. I've been building SlideMaker app, an AI presentation generator that creates slides with charts, timelines, funnels, and 30 different content types. After 50,000+ generations running at 500-600 per day, I've learned exactly where AI JSON breaks—and how to catch it before users do.

Here's the validation stack that got reliability above 95%.

The Problem is Worse Than You Think

JSON Mode ≠ Schema Compliance

First, let's clear up a dangerous misconception. When you enable "JSON mode" on GPT-4, Gemini, or Claude, you're getting a guarantee that the output will be syntactically valid JSON. Brackets will match. Quotes will be escaped. It will parse.

That's it.


JSON mode does NOT guarantee:


This distinction kills production apps. The JSON parses fine, so no error is thrown. But your frontend receives an object missing half the fields it needs, and things silently break.

The Four Ways AI JSON Actually Breaks

After analyzing thousands of failed generations, I've categorized them into four buckets:


1. Missing Fields

You asked for title, body, and image_keywords. You got title and body. No error, just a missing field that crashes your image loader.

2. Wrong Types

Your schema expects data: [10, 20, 30]. The model returns data: "10, 20, 30". Valid JSON. Broken chart.

3. Invalid Enum Values

You specified chart_type must be one of: bar, line, pie. The model decides horizontal_bar sounds better. Your chart library doesn't

agree.

4. Semantic Nonsense

This is the sneaky one. The JSON is structurally perfect. Every field exists. Every type is correct. But the meaning is wrong.


Real example from production: A funnel diagram where the values go UP instead of down.

{
"type": "funnel",
"stages": [
{"label": "Visitors", "value": 100}, 
{"label": "Leads", "value": 250},  
{"label": "Customers", "value": 500}
  ]
}

Syntactically perfect. Semantically absurd. Funnels go DOWN. That's why they're called funnels.

The model knows what a funnel is. It just doesn't always care.

Layer 1: Stop Trusting the Model

The first rule of AI JSON: validate everything.

Don't assume the model followed instructions. Check every field you need actually exists.

def validate_required_fields(slide, required_fields):
    errors = []
    for field in required_fields:
        if field not in slide or not slide[field]:
           errors.append(f"Missing required field: {field}")
    return errors

Simple? Yes. Essential? Absolutely.

Type-Specific Validation

Different output types need different validation rules. A chart needs different fields than a timeline.

VALIDATION_RULES = {
"chart": {    
"required_fields": ["title", "chart_type", "chart_data", "body"],
"min_data_points": 3
 },
   
"timeline": {
"required_fields": ["title", "diagram_data"],
"min_events": 4,
"max_events": 6
},
   
"funnel": {
"required_fields": ["title", "diagram_data"],  
"min_stages": 3,
"max_stages": 5 
	}
}


Then create a validator registry:


VALIDATORS = { "chart": ChartValidator, 
"timeline": TimelineValidator,
"funnel": FunnelValidator,
"bullet_points": BulletValidator,
}

def validate_slide(slide):
    slide_type = slide.get("type", "bullet_points")
    validator = VALIDATORS.get(slide_type, DefaultValidator)()
    return validator.validate(slide)

Each validator knows exactly what its type needs. No guessing.

Layer 2: Semantic Validation

Here's where most validation systems stop—and where most production bugs hide. Fields exist. Types are correct. But is the data correct?

Real Semantic Rules That Catch Real Bugs

Content Type

Validation Rule

Why It Matters

Funnel

Values must decrease

It's a FUNNEL

Timeline

Events must be chronological

It's a TIMELINE

Chart

Label count must match data count

Or the chart breaks

Bullet Points

Each bullet needs title + body

Or it renders empty

Funnel Validation Example

def validate_funnel(slide):
    stages = slide.get("diagram_data", {}).get("stages", [])

    # Check minimum stages
    if len(stages) < 3:
        return False, "Funnel requires at least 3 stages"

    # Check values decrease (this is the semantic part)
    values = [stage.get("value", 0) for stage in stages]

    for i in range(1, len(values)):
        if values[i] >= values[i-1]:
           return False, "Funnel values must decrease at each stage"

    # Check for flat funnels (all same value)
    if len(set(values)) == 1:
        return False, "All stages have identical values"

    return True, None

Timeline Validation Example

def validate_timeline(slide):
    events = slide.get("diagram_data", {}).get("events", [])

    if len(events) < 4:
        return False, "Timeline needs at least 4 events"

    # Check chronological order
    years = [] 
    for event in events:
      year_str = event.get("year", "")
          # Extract numeric year (handles "2020", "Q1 2020", "Jan 2020")
          year = extract_year(year_str)
          if year:
             years.append(year)
  
      if years != sorted(years):
          return False, "Timeline events must be in chronological order"
  
      return True, None

These checks catch the "perfect JSON, broken output" problem that JSON mode completely misses.

Layer 3: Catch the Lazy Model

Sometimes the model gets lazy. Instead of generating real content, it outputs placeholders.

{
 
"title": "Key Benefits",
"bullet_points": [ 
{"title": "Benefit 1", "body": "Details to be added..."},
{"title": "Benefit 2", "body": "TBD"},
{"title": "Benefit 3", "body": "..."}
  ]
}

Structurally valid. Completely useless.

The Forbidden Content List

FORBIDDEN_CONTENT = [
"tbd",
"todo",
"placeholder",
"...",
"xxx",
"fill in",
"to be determined",
"coming soon",
"insert here",
"details about",
"information about",
"content about",
"lorem ipsum"
]

def check_forbidden_content(text):
    text_lower = text.lower()
    for forbidden in FORBIDDEN_CONTENT:
        if forbidden in text_lower:
           return False, f"Contains placeholder content: '{forbidden}'"
    return True, None

Scan All Text Fields

Don't just check the title. Check everything:

def
validate_content_quality(slide):
    errors = []

    # Check title
    title = slide.get("title", "")
    valid, error = check_forbidden_content(title)
    if not valid:
       errors.append(f"Title: {error}")

    # Check body
    body = slide.get("body", "")
    if body:
        valid, error = check_forbidden_content(body)
        if not valid:
           errors.append(f"Body: {error}")

    # Check bullet points
    for i, bullet in enumerate(slide.get("bullet_points", [])):
       bullet_body = bullet.get("body", "")
        valid, error = check_forbidden_content(bullet_body)
        if not valid:
           errors.append(f"Bullet {i+1}: {error}")

    return len(errors) == 0, errors

Layer 4: The Retry Loop That Actually Works

Here's the key insight: retrying with the same prompt gives the same failure. The model made a mistake for a reason. Maybe the schema wasn't clear. Maybe it prioritized brevity over completeness. Whatever the cause, blindly retrying won't fix it.

Error Context Injection

When validation fails, tell the model exactly what went wrong:

def generate_with_retry(prompt, max_retries=2):
    for attempt in range(max_retries + 1):
        response = call_llm(prompt)
        try:
           data = json.loads(response)
           is_valid, errors = validate_slide(data)
        
           if is_valid:
              return data
        
           # Build retry prompt with specific errors
           if attempt < max_retries:
              error_context = "\n".join(f"- {e}" for e in errors)
              prompt = f"""
Previous attempt failed validation:
        {error_context}
        Please fix these specific issues and regenerate.
        
Original request:
{prompt}
"""
       except json.JSONDecodeError as e:
          if attempt < max_retries:
            prompt = f"""
Previous response was not valid JSON.
Error: {str(e)}
        
Please return ONLY valid JSON with no markdown or extra text.
        
Original request:
{prompt}
"""
        
  # All retries exhausted
         return None

Why This Works

The retry prompt now contains:

1.    The specific validation errors

2.    A direct instruction to fix those issues

3.    The original request for context

This transforms a ~85% first-attempt success rate into 95%+ after retry.

Set a Retry Budget

Don't retry forever:

MAX_RETRIES = 2  # Total of 3 attempts


def should_retry(attempt, error_type):
    if attempt >= MAX_RETRIES:
        return False

    # Some errors aren't worth retrying
    if error_type == "rate_limit":
        return True  # Wait and retry
    if error_type == "context_too_long":
        return False  # Need to reduce input, not retry

    return True

After exhausting retries, fall back gracefully:

•       Use a simpler output type

•       Return a partial result with error indication

•       Log for manual review

The retry mechanism isn't error handling. It's part of the system.

Bonus: The Constraint Family Trick

There's one more problem that validation alone doesn't solve: repetitive output.

Ask for a 10-slide presentation and sometimes you get 5 comparison slides. Each one is valid JSON. Each one passes validation. But the presentation is boring and repetitive.

Constraint Families

Group similar content types into families. Allow only ONE from each family per generation:

CONSTRAINT_FAMILIES = {
"comparison": ["comparison_split", "before_after", "pros_cons"],
"highlight": ["big_number", "stat_highlight", "quote"],
"steps": ["numbered_steps", "process_flow"],
"data": ["chart", "table"]
}

def check_family_constraints(slides):
   used_families = {}

    for slide in slides:
       slide_type = slide.get("type")

        for family, types in CONSTRAINT_FAMILIES.items():
            if slide_type in types:
               if family in used_families:
                   return False, f"Multiple {family} types: {used_families[family]} and {slide_type}"
               
               used_families[family] = slide_type

    return True, None

This ensures variety without manual intervention.

The Complete Validation Stack

Here's everything together:

def validate_ai_output(slide, verbosity="balanced"):
"""
    Complete validation pipeline for AI-generated JSON. 
    Returns (is_valid, error_list) 
"""
    errors = []

    # Layer 1: Required fields
    slide_type = slide.get("type", "unknown")
    rules = VALIDATION_RULES.get(slide_type, {})

    for field in rules.get("required_fields", []):
        if field not in slide or not slide[field]:
           errors.append(f"Missing: {field}")

    # Layer 2: Type-specific semantic validation
    validator = VALIDATORS.get(slide_type)
    if validator:
        valid, semantic_errors = validator().validate(slide)
        if not valid:
           errors.extend(semantic_errors)

    # Layer 3: Content quality (forbidden patterns)
    valid, quality_errors = validate_content_quality(slide)
    if not valid:
       errors.extend(quality_errors)

    return len(errors) == 0, errors

Use it in your generation

def generate_slide(topic, slide_type):
    prompt = build_prompt(topic, slide_type)

    result = generate_with_retry(prompt, max_retries=2)

    if result is None:
        # Log failure, use fallback
       
        log_generation_failure(topic, slide_type)
        return create_fallback_slide(topic)

    return result

The Mindset Shift

Stop thinking of validation as error handling. It's not.

Validation is part of the product.

The model is a probabilistic system. It will make mistakes. Your validation layer transforms that probabilistic mess into deterministic, reliable output.

JSON mode is step 1. Not the solution.

After implementing this stack across 50,000+ generations:

•       First attempt success: ~85-90%

•       After retry with error context: 95%+

•       User-visible failures: <2%

That's the difference between a demo and a product.

Quick Reference: Validation Checklist

Before you ship AI JSON to production:

Don't trust the model. Verify the model.


Building something with AI-generated structured data? I'd love to hear what validation challenges you've hit.