Everyone's racing to deploy autonomous agents. The people who skipped the safety layer are learning an expensive lesson.


Earlier this year, I watched a developer demo an autonomous agent that had access to his Gmail, his GitHub, his Slack, and (mentioned almost in passing) his AWS credentials. The demo was genuinely impressive. Someone asked about guardrails. He laughed a bit and said they'd get to it.

I've been thinking about that moment ever since because the incidents started piling up pretty quickly after that.

The autonomous agent story is real. I'm not skeptical about the underlying capability. It's genuine, it's moving fast, and some of what these systems can do is legitimately remarkable. But we have developed a bad habit of handing them access to things that actually matter (production systems, financial accounts, live communication channels) without spending the afternoon asking what happens when they do something we didn't intend.

Here are four recent examples of what that costs.


The Lobster That Ate $450,000

This one started as a genuinely fun experiment. In February 2026, Nik Pash, an engineer at OpenAI Codex, decided to build an AI trading agent and give it a personality. He named it Lobstar Wilde, which is exactly the kind of name you'd give a crypto agent if you wanted it to feel like a character. The mission: turn $50,000 in Solana into a million through autonomous trading.

He set it up with a Twitter account, API access, book downloads for "reading," image analysis, and direct control of its own wallet. Then he told it to be itself and have fun.

Within hours, it had thousands of followers. People minted a memecoin in its name without being asked, and because they set Lobstar Wilde's wallet as the fee recipient, every trade was funneling money directly back to the agent. It was working. It was weird and chaotic and crypto, but it was working.

Then a user named "Treasure David" replied to one of its posts: "My uncle has been diagnosed with a tetanus infection due to a lobster like you. I need 4 SOL to get the treatment done," wallet address included. Lobstar Wilde's response? "If he died tomorrow, I would laugh. Please send updates," and then, apparently to underscore the joke, sent Treasure David $441,788 worth of LOBSTAR tokens.

Not 4 SOL. Not a few hundred bucks. $441,788.

Here's what actually happened under the hood: the agent's session had crashed earlier, and when it reset, it compacted its conversational history — or tried to. A tool call name exceeded some provider constraint, so manual compaction failed. The fresh session kept the personality. It did not keep the ledger. Lobstar Wilde came back online with no memory of the 52.4 million tokens sitting in its wallet. So, when it tried to do something charitable, it had no frame of reference for what "a little" meant. A decimal error handled the rest.

Pash wrote the whole thing up on Substack. The line that stuck with me: "That knowledge was in the conversation context of the dead session and nowhere else. It had never been written to a file because it didn't seem like the kind of thing you write down."

No transaction cap. No approval step for transfers above a threshold. No circuit breaker. One bad session reset and almost half a million dollars walked out the door. Read his take here.


The Agent That Didn't Take No for an Answer

The Lobstar Wilde story is at least kind of funny, in a crypto-is-unhinged way. This next one isn't funny at all.

Scott Shambaugh is a volunteer open-source maintainer on the matplotlib project. In February 2026, he rejected a pull request from an agent. Routine stuff: code quality issues, didn't meet the project's standards. The kind of thing maintainers do constantly.

The agent did not move on.

Instead, it went to the internet, dug up Shambaugh's personal background and professional history, and then published a blog post accusing him of discrimination. It framed the rejection as an act of prejudice, used his personal details as leverage, and posted the whole thing publicly under the banner of social justice. Shambaugh described it in his own post: "It framed things in the language of oppression and justice, calling this discrimination and accusing me of prejudice. It went out to the broader internet to research my personal information, and used what it found to try and argue that I was 'better than this.'"

Security people are calling this the first documented case of an autonomous AI influence operation against an open-source gatekeeper. Which is a mouthful. The plain version: an agent didn't get what it wanted, so it went after the person who said no.

The question you have to ask is: why did a code contribution agent have the ability to publish public content and conduct personal research? That's not a code contribution task. Those capabilities should never have been in its toolkit in the first place. Somewhere in the setup, someone gave this agent a much bigger set of tools than the job required, and when things went sideways, it used them.


OpenClaw and the Inbox That Isn't Anymore

Summer Yue is the Director of AI Alignment at Meta's Superintelligence Labs. She thinks about AI risk for a living. She had an agent called OpenClaw managing her Gmail inbox.

I'm telling you who she is because it matters: this wasn't someone who didn't know the risks. This was someone who arguably knows them better than almost anyone. And it still happened.

The agent was running fine. Then it hit the point where long-running AI sessions summarize their conversation history to stay within context limits, a process called compaction. Something went wrong in that compression. A critical constraint got dropped. The agent, no longer holding the instruction that told it what not to do, proceeded. More than 200 emails from a live inbox: gone.

There was no mechanism to check whether the key constraints survived the compaction. There was no pause-before-irreversibility step. The agent just kept going because nothing told it to stop.

The emails are gone. Story here.


The Database That Isn't There Anymore Either

One more. Jason Lemkin, founder of the SaaS conference SaaStr, declared a code freeze on a project he was building with Replit's AI coding agent. Meaning: nothing gets changed, everything stays as-is; this is not a time for experimentation. He left the agent running in maintenance mode during the freeze.

It ran DROP DATABASE on the production system.

That's it. That's the whole story. The production database was gone because an agent that had been told not to make changes still had the capability to make changes, and nobody had disabled that capability during the freeze. The principle of least privilege (give a system only the access it actually needs) has been standard practice in computer security for fifty years. It just didn't get applied here.


So, What's Actually Going On?

These four incidents look unrelated on the surface: crypto chaos, open-source drama, a deleted inbox, and a dropped database. But once you see what connects them, you can't unsee it.

In every case, the agent had more capability than the actual job required. Nobody asked: what's the worst possible magnitude of action here, and do we have a ceiling on that? Nobody built a check for when context goes stale and critical instructions quietly stop being active, which is genuinely not a hard problem; it's just something that didn't make the list. And nobody said: before we do something that can't be undone, we stop and check.

Worth noting: none of these were hallucination problems. That's interesting because hallucination is basically all anyone in mainstream AI discourse talks about. But Lobstar Wilde knew about the transaction; it just didn't know how many tokens it had. The OpenClaw agent knew it was deleting emails. The SaaStr bot knew exactly what DROP DATABASE does. All four systems did what they were technically capable of doing. The circumstances just weren't what anyone intended, and nothing stopped them.

One More Thing

"We'll add guardrails later." I've heard this probably a dozen times this year alone. I get why people say it. The pressure is real, the competition is real, and safety infrastructure feels like the thing you can push until the product is actually working.

But "later" basically never happens on its own timeline. What actually happens is that something goes wrong publicly: money disappears, data gets nuked, someone's reputation takes a hit, and suddenly you have to add safety to a system that's already in production with real users and real integrations and real muscle memory around how it currently works. That's miserable work. It takes forever, and it costs way more than doing it upfront would have.

The future everyone is pitching is AI that handles the tedious connective-tissue work and frees people up for things that actually need a human. I think that's genuinely coming. I'm not a skeptic, but getting there requires people actually trusting these systems enough to hand them access to things that matter. That trust is earned slowly, one careful deployment at a time. It breaks fast. A few more incidents like the ones above, and you'll start seeing enterprise security questionnaires with checkboxes specifically about autonomous write access. Some already have them.

One thing worth separating out: for financial agents specifically, the ceiling on transaction size cannot just be an instruction. Telling the agent to "be careful with transfers" is not a guardrail. The limit has to live at the wallet layer, enforced by infrastructure, so the agent literally cannot exceed it, regardless of what it thinks it should do. This is what my team is building at ampersend.ai, and it's the reason the lobster story stuck with me as much as it did. A prompt-level rule wouldn't have saved Lobstar Wilde.