If you’re using large language models in real products, “the model gave a sensible answer” is not enough.

What you actually need is:

The model gave a sensible answer in a strict JSON structure that my code can json.loads() without exploding.

This article walks through a practical framework for turning messy, natural-language LLM outputs into machine-friendly structured JSON, using only prompt design. We’ll cover:


1. Why JSON? Moving from “human-readable” to “machine-readable”

By default, LLMs talk like people: paragraphs, bullet points, and the occasional emoji.

Example request:

“Compare three popular laptops and give me the core specs.”

A typical answer might be:

  1. Laptop A: 16-inch display, 2.5K resolution, Intel i7, about £1,300
  2. Laptop B: 14-inch, 2.2K, AMD R7, around £1,000
  3. Laptop C: 13.6-inch, Retina-style display, Apple M-series, ~£1,500

Nice for humans. Awful for code.

If you want to:

…you’re forced to regex your way through free text. Any tiny format change breaks your parsing.

JSON fixes this in three ways

1. Syntax is strict, parsing is deterministic

If the output is valid JSON, parsing is a solved problem.

2. Types are explicit

3. Nested structure matches real data

Think: user → order list → line items. JSON handles this naturally:

{
  "user": {
    "name": "Alice",
    "orders": [
      { "product": "Laptop", "price_gbp": 1299 },
      { "product": "Monitor", "price_gbp": 199 }
    ]
  }
}

Example: natural language vs JSON

Free-text output:

“We compared 3 laptops. First, a 16" Lenovo with 2.5K display… second, a 14" HP… third, a 13.6" MacBook Air… prices roughly £1,000–£1,500…”

JSON output:

{
  "laptop_analysis": {
    "analysis_date": "2025-01-01",
    "total_count": 3,
    "laptops": [
      {
        "brand": "Lenovo",
        "model": "Slim 7",
        "screen": {
          "size_inch": 16,
          "resolution": "2.5K",
          "touch_support": false
        },
        "processor": "Intel i7",
        "price_gbp": 1299
      },
      {
        "brand": "HP",
        "model": "Envy 14",
        "screen": {
          "size_inch": 14,
          "resolution": "2.2K",
          "touch_support": true
        },
        "processor": "AMD Ryzen 7",
        "price_gbp": 1049
      },
      {
        "brand": "Apple",
        "model": "MacBook Air M2",
        "screen": {
          "size_inch": 13.6,
          "resolution": "Retina-class",
          "touch_support": false
        },
        "processor": "Apple M2",
        "price_gbp": 1249
      }
    ]
  }
}

Now your pipeline can do:

data = json.loads(output)
for laptop in data["laptop_analysis"]["laptops"]:
    ...

No brittle parsing. No surprises.


2. A 4-step pattern for “forced JSON” prompts

Getting an LLM to output proper JSON isn’t magic. A robust prompt usually has four ingredients:

  1. Format instructions – “Only output JSON, nothing else.”
  2. A concrete JSON template – the exact keys and structure you expect.
  3. Validation rules – type constraints, required fields, allowed values.
  4. Few-shot examples – one or two “here’s the input, here’s the JSON” samples.

Let’s go through them.


Step 1 – Hard-lock the output format

You must explicitly fight the model’s “chatty” instinct.

Bad instruction:

“Please use JSON format, you can also add explanations.”

You will absolutely get:

Here is your analysis:
{
  ...
}
Hope this helps!

Your parser will absolutely die.

Use strict wording instead:

You MUST return ONLY valid JSON.
​
- Do NOT include any explanations, comments, or extra text.
- The output must be a single JSON object.
- If you include any non-JSON content, the result is invalid.

You can go even stricter by wrapping it:

【HARD REQUIREMENT】
Return output wrapped between the markers ---BEGIN JSON--- and ---END JSON---.
Outside these markers there must be NOTHING (no text, no spaces, no newlines).
​
Example:
---BEGIN JSON---
{"key": "value"}
---END JSON---

Then your code can safely extract the block between those markers before parsing.


Step 2 – Provide a JSON “fill-in-the-blanks” template

Don’t leave structure to the model’s imagination. Tell it exactly what object you want.

Example: extracting news metadata.

{
  "news_extraction": {
    "article_title": "",      // string, full headline
    "publish_time": "",       // string, "YYYY-MM-DD HH:MM", or null
    "source": "",             // string, e.g. "BBC News"
    "author": "",             // string or null
    "key_points": [],         // array of 3–5 strings, each ≤ 50 chars
    "category": "",           // one of: "Politics", "Business", "Tech", "Entertainment", "Sport"
    "word_count": 0           // integer, total word count
  }
}

Template design tips:

This turns the model’s job into “fill in a form”, not “invent whatever feels right”.


Step 3 – Add lightweight validation rules

The template defines shape. Validation rules define what’s legal inside that shape.

Examples you can include in the prompt:

You don’t need a full JSON Schema in the prompt, but a few clear bullets like this reduce errors dramatically.


Step 4 – Use one or two few-shot examples

Models learn fast by imitation. Give them a mini “input → JSON” pair that matches your task.

Example: news extraction.

Prompt snippet:

Example input article:
​
"[Tech] UK startup launches home battery to cut energy bills
Source: The Guardian  Author: Jane Smith  Published: 2024-12-30 10:00
A London-based climate tech startup has launched a compact home battery
designed to help households store cheap off-peak electricity and reduce
their energy bills..."
​
Example JSON output:
{
  "news_extraction": {
    "article_title": "UK startup launches home battery to cut energy bills",
    "publish_time": "2024-12-30 10:00",
    "source": "The Guardian",
    "author": "Jane Smith",
    "key_points": [
      "London climate tech startup releases compact home battery",
      "Product lets households store off-peak electricity and lower bills",
      "Targets UK homeowners looking to reduce reliance on the grid"
    ],
    "category": "Tech",
    "word_count": 850
  }
}

Then you append your real article and say:

“Now extract data for the following article. Remember: only output JSON in the same format as the example.”

This single example often bumps JSON correctness from “coin flip” to “production-ready”.


3. Debugging JSON output: 5 common failure modes

Even with good prompts, you’ll still see issues. Here’s what usually goes wrong and how to fix it.


Problem 1 – Extra natural language before/after JSON

“Here is your result: { … } Hope this helps!”

Why it happens: chatty default behaviour; format instruction too soft.

How to fix:


Problem 2 – Broken JSON syntax

Examples:

Fixes:

  1. Add a “JSON hygiene” reminder:

    JSON syntax rules:
    - All keys MUST be in double quotes.
    - Use double quotes for strings, never single quotes.
    - No trailing commas after the last element in an object or array.
    - All { [ must have matching } ].
    
  2. For very long/complex structures, generate in steps:

    • Step 1: output only the top-level structure.
    • Step 2: fill a particular nested array.
    • Step 3: add the rest.
  3. Add a retry loop in your code:

    • Try json.loads().

    • If it fails, send the error message back to the model:

      “Your previous JSON failed to parse with JSONDecodeError: …. Please correct the JSON and output a fixed version. Do not change the structure.”


Problem 3 – Wrong data types

Examples:

Fixes:


Problem 4 – Missing or extra fields

Examples:

Fixes:


Problem 5 – Messy nested structures

This is where things like arrays of objects containing arrays go sideways.

Fixes:


4. Three ready-to-use JSON prompt templates

Here are three complete patterns you can lift straight into your own system.


Scenario 1 – E-commerce product extraction (for database import)

Goal: From a UK shop’s product description, extract key fields like product ID, category, specs, price, stock, etc.

Prompt core:

Task: Extract key product data from the following product description and return JSON only.
​
### Output requirements
1. Output MUST be valid JSON, no extra text.
2. Use this template exactly (do not rename keys):
​
{
  "product_info": {
    "product_id": "",        // string, e.g. "P20250201001"
    "product_name": "",      // full name, not abbreviated
    "category": "",          // one of: "Laptop", "Phone", "Appliance", "Clothing", "Food"
    "specifications": [],    // 2–3 core specs as strings
    "price_gbp": 0.0,        // number, price in GBP, e.g. 999.0
    "stock": 0,              // integer, units in stock
    "free_shipping": false,  // boolean, true if free delivery in mainland UK
    "sales_count": 0         // integer, total units sold (0 if not mentioned)
  }
}
​
3. Rules:
   - No "£" symbol in price_gbp, number only.
   - If no product_id mentioned, use "unknown".
   - If no sales info, use 0 for sales_count.
​
### Product text:
"..."

Example model output:

{
  "product_info": {
    "product_id": "P20250201005",
    "product_name": "Dell XPS 13 Plus 13.4" Laptop",
    "category": "Laptop",
    "specifications": [
      "Colour: Platinum",
      "Memory: 16GB RAM, 512GB SSD",
      "Display: 13.4" OLED, 120Hz"
    ],
    "price_gbp": 1499.0,
    "stock": 42,
    "free_shipping": true,
    "sales_count": 850
  }
}

In Python, it’s just:

import json
​
data = json.loads(model_output)
price = data["product_info"]["price_gbp"]
stock = data["product_info"]["stock"]

And you’re ready to insert into a DB.


Scenario 2 – Customer feedback sentiment (for ticket routing)

Goal: Take free-text customer feedback and turn it into structured analysis for your support system.

Template:

{
  "feedback_analysis": {
    "feedback_id": "",      // string, you can generate like "F20250201093001"
    "sentiment": "",        // "Positive" | "Negative" | "Neutral"
    "core_demand": "",      // 10–30 chars summary of what the customer wants
    "issue_type": "",       // "Delivery" | "Quality" | "After-sales" | "Enquiry"
    "urgency_level": 0,     // 1 = low, 2 = medium, 3 = high
    "keywords": []          // 3–4 noun keywords, e.g. ["laptop", "screen crack"]
  }
}

Rule of thumb for urgency:

Example output:

{
  "feedback_analysis": {
    "feedback_id": "F20250201093001",
    "sentiment": "Negative",
    "core_demand": "Request replacement or refund for dead-on-arrival laptop",
    "issue_type": "Quality",
    "urgency_level": 3,
    "keywords": ["laptop", "won't turn on", "replacement", "refund"]
  }
}

Your ticketing system can now:


Scenario 3 – Project task breakdown (for Jira/Trello import)

Goal: Turn a “website redesign” paragraph into a structured task list.

Template:

{
  "project": "Website Redesign",
  "tasks": [
    {
      "task_id": "T001",          // T + 3 digits
      "task_name": "",            // 10–20 chars, clear action
      "owner": "",                // "Product Manager" | "Designer" | "Frontend" | "Backend" | "QA"
      "due_date": "",             // "YYYY-MM-DD", assume project start 2025-02-01
      "priority": "",             // "High" | "Medium" | "Low"
      "dependencies": []          // e.g. ["T001"], [] if none
    }
  ],
  "total_tasks": 0                // number of items in tasks[]
}

Rules:

Example output (shortened):

{
  "project": "Website Redesign",
  "tasks": [
    {
      "task_id": "T001",
      "task_name": "Gather detailed redesign requirements",
      "owner": "Product Manager",
      "due_date": "2025-02-03",
      "priority": "High",
      "dependencies": []
    },
    {
      "task_id": "T002",
      "task_name": "Design new homepage and listing UI",
      "owner": "Designer",
      "due_date": "2025-02-08",
      "priority": "High",
      "dependencies": ["T001"]
    },
    {
      "task_id": "T003",
      "task_name": "Implement login and registration backend",
      "owner": "Backend",
      "due_date": "2025-02-13",
      "priority": "High",
      "dependencies": ["T001"]
    }
  ],
  "total_tasks": 3
}

You can then POST tasks into Jira/Trello with their APIs and auto-create all tickets.


5. From “stable JSON” to “production-ready pipelines”

To recap:

Once you can reliably get structured JSON out of an LLM, you move from:

“The AI wrote something interesting.”

to:

“The AI is now a machine in the pipeline: it reads text, outputs structured data, and my system just works.”

That’s the real unlock.