sia.hackernoon.com

I spent six months watching about 40 people a day use something I built, an AI presentation maker called SlideMaker app. It didn’t look like growth; it looked like a flatline with a heartbeat. But I kept building, because I wanted to understand how AI systems behave in production—not in tutorials or demos, but with real users doing real work. Here’s what I learned

Why I Started

I wanted to learn. That's it. I'd been reading about LLMs, watching demos, and following tutorials. But there's a gap between "I called the OpenAI API" and "I have a system that works reliably for strangers on the internet." I wanted to close that gap.

So I built a presentation generator. You type a topic, and AI creates slides. Simple concept, surprisingly complex execution.

I chose presentations because:

The output is structured (JSON with specific schemas)
It requires multiple API calls working together
Success and failure are obvious (the slides either work or they don't)
Everyone needs presentations, so I'd have real users to learn from

The First Six Months: 40 Per Day

For five to six months, my daily numbers looked like this:

Monday	38
Tuesday	42
Wednesday	41
Thursday	39
Friday	44

Forty presentations per day. Sometimes 35. Sometimes 50. Never 100. I checked the analytics constantly. Refreshed dashboards. Looked for patterns that weren't there. Here's what I didn't do: quit. Not because I'm disciplined. Because those 40 daily users were teaching me things I couldn't learn any other way:

Real failure modes. My local testing never produced a funnel diagram where the values went UP instead of down. Production did. Multiple times.

Edge cases I never imagined. Users requesting presentations in languages I don't speak. Topics I'd never heard of. Requests that technically worked but produced nonsense.

The slow period wasn't wasted time. It was tuition.

The Technical Problems Nobody Warns You About

Problem 1: JSON Mode Doesn't Mean What You Think

The biggest misconception I had: "JSON mode guarantees my schema will be followed." Wrong. JSON mode guarantees valid JSON. Brackets match.

Quotes are escaped. It parses. That's it. JSON mode does NOT guarantee:

Required fields exist
Values are the correct type
Enums contain valid options
The data makes any sense

I learned this the hard way when users started seeing empty charts. The JSON was valid. The chart_data field was just... missing.

Problem 2: AI Gets Lazy

Sometimes the model doesn't want to work. Instead of generating real content:

{
  "title": "Key Benefits",
  "bullet_points": [
    {"title": "Benefit 1", "body": "Details to be added..."},
    {"title": "Benefit 2", "body": "TBD"},
    {"title": "Benefit 3", "body": "..."}
  ]
}

Valid JSON. Completely useless. I built a forbidden content scanner that catches these patterns: "TBD", "lorem ipsum", "details here", "to be determined", "coming soon", "insert here". It runs on every text field. When it catches something, the system retries with explicit instructions: "Do not use placeholder content."

Problem 3: Semantic Nonsense

This one took months to figure out. A user requested a funnel diagram. The AI returned:

{
  "type": "funnel",
  "stages": [
    {"label": "Visitors", "value": 100},
    {"label": "Leads", "value": 250},
    {"label": "Customers", "value": 500}
  ]
}

Valid JSON. Valid schema. Completely wrong. Funnels go DOWN. That's why they're called funnels. The values should decrease, not increase. The model knows what a funnel is. It just doesn't always care. I had to build semantic validators, rules that check if the data makes logical sense, not just structural sense:

Funnel values must decrease
Timeline events must be chronological
Chart labels must match data array length
Process flows must have start and end nodes

These rules caught bugs that pure schema validation missed entirely.

Problem 4: Retrying Wrong

My first retry logic:

if validation_fails:
    retry()  # Same prompt, hope for different result

This is stupid. If the model made a mistake, it made it for a reason. Same prompt = same mistake. The fix: inject the specific error into the retry prompt.

if validation_fails:
    retry_with_context(f"""
    Previous attempt failed:
    - Funnel values must decrease (got: 100 → 250 → 500)
    
    Fix this specific issue.
    """)

This changed first-attempt success from ~85% to over 95% after one retry.

What I Built (Technically)

The stack is boring on purpose:

Server: AWS EC2 t3.large. Nothing fancy. Runs everything.
AI: OpenAI and Gemini for content generation. I switch between them depending on reliability and cost. fal.ai for AI-generated images when needed. Unsplash for stock photos with proper attribution.
Email: AWS SES. Works, Cheap.
App: React frontend, Flask backend, MySQL database, Redis for caching and session management.
Real-time: Server-Sent Events to stream generation progress. Users see their presentation building live instead of staring at a spinner for 30 seconds.

I didn't choose this stack because it's optimal. I chose it because I understood it well enough to debug problems at 2am.

Month 10: Something Changed

I don't fully understand what happened.

Month 10: 7,000 users.
Month 11: 12,000 users.
Current: 12-15K monthly, 500-600 generations per day.

The growth wasn't gradual. It stepped up suddenly.

My theories (I can't prove any of these):

Compound quality. After months of fixing edge cases, the output became consistently good. Not sometimes good, consistently good. Maybe that's the threshold that triggers word of mouth.

Network effects with no network. Students tell classmates. Teachers share with departments. Each user is a potential distribution channel, but it takes time to accumulate enough users for this to matter.

The use case is universal. Everyone needs presentations. Students, teachers, employees, researchers, government workers. I didn't target a niche. I targeted a universal pain point.

The honest answer: I don't know. The product got better, time passed, and growth happened. Correlation isn't causation.

Keeping It Running

The system handles 500-600 generations daily. That's not an impressive scale by any measure, but it's enough to surface real operational challenges:

Usage limits. 15 presentations per day for logged-in users, 5 for anonymous. This prevents abuse without blocking legitimate use. Most people don't need 15 presentations in one day.

Validation overhead. Every generation runs through 4 validation layers. This adds latency but catches failures before users see them.

Retry budget. Maximum 2 retries per generation. After that, fail gracefully with a simpler output rather than infinite loops.

Progress streaming. Generation takes 15-30 seconds. Without progress feedback, users think it's broken. SSE streaming shows percentage completion and each slide as it's generated.

What Actually Gets Used

I expected students. I got:

Students (yes, the largest group)
Teachers preparing lecture materials
Corporate employees rushing to create decks before meetings
Government education departments creating training content
Researchers making conference presentations

The common thread: everyone hates making presentations manually, regardless of context.

What I'd Do Differently

Instrument everything from day one. I have rough reliability numbers based on error logs and observation. I don't have precise metrics because I didn't build a logging infrastructure early. This makes it hard to know if changes actually improved things.

Simpler output types first. I built support for 30 different slide types (charts, timelines, funnels, mind maps, process flows, etc.). I should have started with 5 and added more only when users asked.

Less time on features, more time on reliability. Early on, I kept adding capabilities. I should have focused harder on making the basic stuff work perfectly.

The Actual Lessons

Side projects teach things jobs don't. When something breaks at 2 a.m., and it's your problem, you learn differently than when you can escalate.

Slow growth is still growth. Those 40 daily users during the flat period were real people getting real value. The product improved even when metrics didn't.

Boring technology choices are correct. I never once regretted using well-understood, well-documented tools. I frequently debug AI behavior. I never debug framework weirdness.

Free removes friction. Being free meant students could share with classmates without asking for budget approval. Teachers could recommend it without procurement. Word of mouth became the entire growth strategy because there was nothing blocking it.

Production is the teacher. Tutorials show happy paths. Production shows everything else.

Where It Stands

50,000+ presentations created. 500-600 per day. Still growing at about 1.7x month over month.

I still work on it. Still fixing edge cases. Still learning. It was supposed to be a learning project. It still is. It just turned out that other people found it useful too.

If you're building something with AI-generated structured data, I wrote a separate piece on the specific JSON validation problems and how to solve them. The semantic validation layer alone saved me from countless user complaints.

What Six Months of Real Users Taught Me About Running AI Systems in Production