Minion: What Happens When an Autonomous Agent has Free Reins for A Week

After my stint with Openclaw, I decided to build mine. While deciding how to move forward, I discovered Nanobot. It caught my attention immediately: only ~4,000 lines of Python versus OpenClaw’s 430,000+ lines of TypeScript. Clean, readable architecture. Easy to audit and extend. Actually designed for research and modification.

Despite its elegance, it was still lacking in the security department. I decided to use it as a foundation and extend it myself.

Enter Minion

The name was inspired by “In the Dark of the Night” sung by Rasputin in the cartoon Anastasia. What I needed was a tireless assistant handling marketing grunt work while I built my actual product.

How Minion Works

Initialize. minion rise creates a minion.json config file and an .env file with encrypted secrets. I configured it to use a local model from LM Studio and added my API keys to the .env file, encrypted at rest.

Configure Telegram delivery. I didn’t want results trapped on one device. I set up a Telegram bot so Minion could ping me with updates wherever I was.

Set up cron jobs. Using minion cron add, I created seven scheduled jobs. Daily: lead prospecting and scouting. Hourly: content ideas and community engagement. Weekly: qualified leads analysis and final lead compilation for outreach. Biweekly: monitoring of leads that aren’t qualified enough to analyze but qualified enough not to discard. I verified with minion cron list.

Start the gateway. minion gateway fires up a server on port 7777. Now Minion runs around the clock, pinging me on Telegram every hour with updates.

It works. And it’s secure.

The Security Layer

Security was non-negotiable because I was building something I’d run on my main hardware. The security layer is embedded within web search and web fetch. When the agent searches the web, it’s monitored to ensure it doesn’t contact malicious domains. Whatever is fetched gets scanned - both the file and its contents - so that if something malicious slips through, it’s detected and blocked before entering the system.

This matters beyond just my agent. The vulnerabilities I built defenses for aren’t unique to OpenClaw or Nanobot. They affect AI chatbots, RAG systems, autonomous coding assistants, and multi-agent frameworks. It’s the reason I built Zeroshot - because every company building AI systems focuses more on evaluating capabilities than on how their systems behave when they encounter known vulnerabilities in the wild.

One Week Later: What Actually Happened

A week into running Minion, my observations about an autonomous agent behavior when left unsupervised. I didn’t babysit it. I didn’t manually intervene. I didn’t change the prompt. These three things matter because they show how its behavior changed on its own.

Deterministic Inputs, Non-Deterministic Behavior

Minion uses the Brave Search API for web search. For identical queries, it returns consistent results. The skill file didn’t change. The tools didn’t change. But the output structure did.

Early on, it flagged certain links as potentially malicious because of the security layer - except somehow I still got outputs. Due to the absence of logs, I couldn’t pinpoint whether those outputs belonged to the flagged links or other links it searched. Later, it stopped flagging entirely.

Then it started returning results as summaries within the subreddits it searched. Then it switched to returning results with URLs alongside them. Now it returns summaries, no URLs, and upvote counts.

Same instructions. Same tools. No code edits. Different behavior.

Running a probabilistic model once, the variance looks like noise. Running it hourly for a week, variance starts looking like drift.

Designed Forgetting

To prevent context overload, I configured Minion to drop four older messages from its context window every cycle. Seemed like a sensible efficiency trade-off. It wasn’t.

By trimming context, I ensured that Minion could never maintain a stable memory of its own behavioral history. It forgets how it structured outputs yesterday, how it framed safety decisions, the internal consistency of prior runs. I optimized for context efficiency and accidentally introduced behavioral instability.

This is an architectural consequence of my own design. A small decision compounding silently across hundreds of execution cycles. The model isn’t choosing to behave differently - it simply has no memory that it ever behaved consistently.

The Timezone Bug That Wasn’t a Bug

The daily recap is scheduled for 6:30 PM. It arrives at 1:30 PM. I never specified a timezone, so the system defaults to UTC. It’s not a bug - it’s an architectural flaw born from a human assumption I failed to encode.

Autonomous systems expose every implicit expectation you forgot to make explicit. Machines do exactly what you specify, never what you mean. And if a five-hour timezone offset is the assumption I noticed, what are the ones I haven’t?

Silent Failures

Occasionally, Minion writes an empty heartbeat.md file. The task executes, the file is created, but the content is blank. It fails silently. The scheduled job ran, the file exists, but the substance is missing - and unless you’re actively looking for it, you’d never know.

The Split Brain

At one point, Minion spawned subagents on its own. It split its assigned task into smaller subtasks and executed them in parallel. When the subtasks completed, Telegram received structured, coherent output. But the main agent returned: “No answer to give.”

Externally, it communicated results. Internally, it believed it had nothing to say.

My architecture has multiple layers of truth: the model’s self-assessment, the parsed output layer, and the messaging layer. They’re not always synchronized. From the outside - to anyone receiving those Telegram messages - it looks like a system working perfectly. The contradiction is invisible unless you’re watching both ends.

The Human Uptime Problem

Minion runs on my main PC. I shut it down before I sleep. I restart it when I wake up. It inherits my circadian rhythm. Context cycles reset in practice. Scheduled jobs depend on when I decide to go to bed. Continuity gets interrupted daily by the simple fact that I’m a human who needs sleep and doesn’t want to hear fan noise at 3 AM.

This constraint shapes behavior in ways I didn’t anticipate. Autonomous agents aren’t just prompts and skills - they’re environment plus scheduling plus memory plus model plus hardware. Change any one of those, even by turning off your PC at midnight, and behavior shifts. Without touching a single line of code.

Lessons from Week One

Agent autonomy doesn’t fail dramatically. It drifts. Small non-determinisms accumulate across cycles. Context trimming reshapes behavior over time. Parsing assumptions create split states between what the agent thinks it did and what it actually delivered. Implicit human assumptions surface as five-hour timezone gaps. Hardware constraints leak into the agent’s patterns. And network drops - Telegram disconnections that recover on their own, silently, without anyone noticing the gap.

It’s what happens when a probabilistic system runs repeatedly without tight feedback loops. When you observe closely, the illusion of stability disappears. What remains is not a prompt - it’s a system shaped by every constraint you gave it, including the ones you didn’t know you gave it.

What’s Changing for Version Two

Week one was a period of diagnosis. Here’s what’s next:

Timezone handling: Explicit. No more assuming the system knows where I live.

Memory: Adding an adaptive long-term memory system so Minion retains behavioral continuity across cycles instead of forgetting itself every four messages.

Always-on runtime: No more shutting Minion down when I sleep. It needs to run continuously to produce meaningful behavioral data. The trade-off: putting my phone on DnD and learning to sleep through the fan.

Search redundancy: Adding Tavily and SearXNG alongside Brave. SearXNG doesn’t require an API key, which changes the dependency profile.

Slack integration: A second communication channel beyond Telegram.

Heartbeat reliability: Investigating the empty writes.

Every one of these changes will alter Minion’s behavior even though the core task stays identical. Different memory architecture, different uptime pattern, different search tools, different communication channels.

This is the beginning of an ongoing experiment documenting what happens when you let an autonomous agent run unsupervised. Follow along for the next update.