I spent six months watching about 40 people a day use something I built, an AI presentation maker called SlideMaker app. It didn’t look like growth; it looked like a flatline with a heartbeat. But I kept building, because I wanted to understand how AI systems behave in production—not in tutorials or demos, but with real users doing real work. Here’s what I learned

Why I Started

I wanted to learn. That's it. I'd been reading about LLMs, watching demos, and following tutorials. But there's a gap between "I called the OpenAI API" and "I have a system that works reliably for strangers on the internet." I wanted to close that gap.

 

So I built a presentation generator. You type a topic, and AI creates slides. Simple concept, surprisingly complex execution.


I chose presentations because:

The First Six Months: 40 Per Day

For five to six months, my daily numbers looked like this:

Monday38
Tuesday42
Wednesday41
Thursday39
Friday44


Forty presentations per day. Sometimes 35. Sometimes 50. Never 100. I checked the analytics constantly. Refreshed dashboards. Looked for patterns that weren't there. Here's what I didn't do: quit. Not because I'm disciplined. Because those 40 daily users were teaching me things I couldn't learn any other way:

 

Real failure modes. My local testing never produced a funnel diagram where the values went UP instead of down. Production did. Multiple times.

Edge cases I never imagined. Users requesting presentations in languages I don't speak. Topics I'd never heard of. Requests that technically worked but produced nonsense.

The slow period wasn't wasted time. It was tuition.

The Technical Problems Nobody Warns You About

Problem 1: JSON Mode Doesn't Mean What You Think

The biggest misconception I had: "JSON mode guarantees my schema will be followed." Wrong. JSON mode guarantees valid JSON. Brackets match.


Quotes are escaped. It parses. That's it. JSON mode does NOT guarantee:


I learned this the hard way when users started seeing empty charts. The JSON was valid. The chart_data field was just... missing.

Problem 2: AI Gets Lazy

Sometimes the model doesn't want to work. Instead of generating real content:

{
  "title": "Key Benefits",
  "bullet_points": [
    {"title": "Benefit 1", "body": "Details to be added..."},
    {"title": "Benefit 2", "body": "TBD"},
    {"title": "Benefit 3", "body": "..."}
  ]
}


Valid JSON. Completely useless. I built a forbidden content scanner that catches these patterns: "TBD", "lorem ipsum", "details here", "to be determined", "coming soon", "insert here". It runs on every text field. When it catches something, the system retries with explicit instructions: "Do not use placeholder content."

Problem 3: Semantic Nonsense

This one took months to figure out. A user requested a funnel diagram. The AI returned:

{
  "type": "funnel",
  "stages": [
    {"label": "Visitors", "value": 100},
    {"label": "Leads", "value": 250},
    {"label": "Customers", "value": 500}
  ]
}


Valid JSON. Valid schema. Completely wrong. Funnels go DOWN. That's why they're called funnels. The values should decrease, not increase. The model knows what a funnel is. It just doesn't always care. I had to build semantic validators, rules that check if the data makes logical sense, not just structural sense:

These rules caught bugs that pure schema validation missed entirely.

Problem 4: Retrying Wrong

My first retry logic:

if validation_fails:
    retry()  # Same prompt, hope for different result


This is stupid. If the model made a mistake, it made it for a reason. Same prompt = same mistake. The fix: inject the specific error into the retry prompt.

if validation_fails:
    retry_with_context(f"""
    Previous attempt failed:
    - Funnel values must decrease (got: 100 → 250 → 500)
    
    Fix this specific issue.
    """)


This changed first-attempt success from ~85% to over 95% after one retry.

What I Built (Technically)

The stack is boring on purpose:


I didn't choose this stack because it's optimal. I chose it because I understood it well enough to debug problems at 2am.

Month 10: Something Changed

I don't fully understand what happened.


The growth wasn't gradual. It stepped up suddenly.

My theories (I can't prove any of these):


Compound quality. After months of fixing edge cases, the output became consistently good. Not sometimes good, consistently good. Maybe that's the threshold that triggers word of mouth.


Network effects with no network. Students tell classmates. Teachers share with departments. Each user is a potential distribution channel, but it takes time to accumulate enough users for this to matter.


The use case is universal. Everyone needs presentations. Students, teachers, employees, researchers, government workers. I didn't target a niche. I targeted a universal pain point.

The honest answer: I don't know. The product got better, time passed, and growth happened. Correlation isn't causation.

Keeping It Running

The system handles 500-600 generations daily. That's not an impressive scale by any measure, but it's enough to surface real operational challenges:

 

Usage limits. 15 presentations per day for logged-in users, 5 for anonymous. This prevents abuse without blocking legitimate use. Most people don't need 15 presentations in one day.


Validation overhead. Every generation runs through 4 validation layers. This adds latency but catches failures before users see them.


Retry budget. Maximum 2 retries per generation. After that, fail gracefully with a simpler output rather than infinite loops.


Progress streaming. Generation takes 15-30 seconds. Without progress feedback, users think it's broken. SSE streaming shows percentage completion and each slide as it's generated.

What Actually Gets Used

I expected students. I got:


The common thread: everyone hates making presentations manually, regardless of context.

What I'd Do Differently

Instrument everything from day one. I have rough reliability numbers based on error logs and observation. I don't have precise metrics because I didn't build a logging infrastructure early. This makes it hard to know if changes actually improved things.


Simpler output types first. I built support for 30 different slide types (charts, timelines, funnels, mind maps, process flows, etc.). I should have started with 5 and added more only when users asked.


Less time on features, more time on reliability. Early on, I kept adding capabilities. I should have focused harder on making the basic stuff work perfectly.

The Actual Lessons

Side projects teach things jobs don't. When something breaks at 2 a.m., and it's your problem, you learn differently than when you can escalate.


Slow growth is still growth. Those 40 daily users during the flat period were real people getting real value. The product improved even when metrics didn't.


Boring technology choices are correct. I never once regretted using well-understood, well-documented tools. I frequently debug AI behavior. I never debug framework weirdness.


Free removes friction. Being free meant students could share with classmates without asking for budget approval. Teachers could recommend it without procurement. Word of mouth became the entire growth strategy because there was nothing blocking it.


Production is the teacher. Tutorials show happy paths. Production shows everything else.

Where It Stands

50,000+ presentations created. 500-600 per day. Still growing at about 1.7x month over month.

I still work on it. Still fixing edge cases. Still learning. It was supposed to be a learning project. It still is. It just turned out that other people found it useful too.


If you're building something with AI-generated structured data, I wrote a separate piece on the specific JSON validation problems and how to solve them. The semantic validation layer alone saved me from countless user complaints.