Animation is too hard.

To make a 30-second clip, you usually need to learn Blender, hire voice actors, frame keyframes, and render for twelve hours. I wanted to make memes, shitposts, and storytelling clips, and I wanted them done in the time it takes to drink a coffee.

So, I built the Stickverse.

It is a scrappy, Python-based animation pipeline that turns a raw text story into a fully voiced, lip-synced (well, "lip-flapped") video. It uses Google's Gemini to parse the script, a CLI tool to record your own bad voice acting, and OpenCV to draw the frames.

Here is how I built it, and how you can use it to create your own episodes of the Stickverse.

The Architecture

The pipeline consists of three distinct Python scripts:

  1. The Parser (parser.py): Uses Gemini to turn a text block into a storyboard (JSON).
  2. The Director (director.py): A CLI tool that prompts you to record lines one by one.
  3. The Animator (animator.py): The engine that stitches audio and vector graphics into an MP4.

Let's break down the code.

Phase 1: The Screenwriter (Powered by Gemini)

The hardest part of procedural animation is structure. If I write a story, I don't want to manually tag every single line with { "character": "Steve" }.

I delegated this to the Gemini API. I feed it raw text, and via a system prompt, it returns clean JSON.

Here is the secret sauce in parser.py. The system prompt enforces strict rules so the animator doesn't crash later:

system_prompt = """
You are a Screenplay parsing engine. Convert the following raw text story into a structured JSON animation script.

RULES:
1. Identify every character speaking.
2. Identify scene changes (backgrounds).
3. Output JSON ONLY. No markdown, no explanations.

JSON STRUCTURE:
[
  { "type": "background", "description": "desert" },
  { "type": "speak", "character": "Walter", "text": "Jesse, we have to cook." }
]
"""

If I feed it a text file where I wrote "Walter yells at Jesse about API keys," Gemini breaks it down into the exact scenes and lines I need to record.

Phase 2: The Director (The Karaoke CLI)

Once we have a script_plan.json, we need audio.

I didn't want to use AI Text-to-Speech (TTS) because AI voices have no soul. They can't capture the panic of a man realizing he missed a deployment deadline.

I wrote director.py using the sounddevice library. It reads the JSON, clears the terminal, and acts like a teleprompter.

# snippet from director.py
print(f"🗣️  CHARACTER: {char.upper()}")
print(f"📝 LINE: \"{text}\"")

# It waits 15 seconds for you to get into character
# Then records automatically when you hit Ctrl+C

It saves every line as a separate .wav file (line_0_Walter.wav) and updates the JSON to link the text to the audio.

Phase 3: The Animator (Math, Not Magic)

This is where the "Stickverse" comes alive. I didn't use a game engine. I used OpenCV (to draw lines) and MoviePy (to stitch frames).

The "Lip-Sync" Algorithm

I couldn't be bothered to train a neural network for lip-syncing. Instead, I used scipy.io.wavfile to analyze the volume of the audio file.

# snippet from animator.py
def draw_frame(t, character_speaking, mouth_open_amount):
    # ... drawing the body ...
    
    gap = int(mouth_open_amount * 20) # Calculate jaw drop
    
    # Draw top of head moving up and down
    top_center = (x, 300 - gap)
    cv2.ellipse(img, top_center, (50, 50), 0, 180, 360, color, -1)

The result is a flappy-headed aesthetic that looks like South Park meets a terminal window.

The Demo: "Breaking Stories"

https://youtu.be/vwO11W2zwRw?embedable=true

To test the system, I fed it a script about Walter and Jesse arguing about Hackernoon editorial standards.

The Prompt (that I totally followed to the letter):

"The scene takes place in a desert. Walter says: Jesse, we have to cook. These editors want 2 keys of articles by the end of the week."

The Result:

Join the Stickverse

The code is open source. It is messy, it is funny, and it works.

The beauty of the Stickverse is that it is strictly code-based.

You can find the repo here: https://github.com/damianwgriggs/The-Stickverse

Clone it, modify the draw_frame function, and create your own episode. We have to cook.

Genesis Art Engine Prompt for featured photo: A blazing hot desert with a radiating sun. Meant to fit the “Breaking Stories” theme.