A few weekends ago, I was tinkering around with OpenClaw, the AI agent that recently took the internet by storm. The goal was simple: Understand its architecture and see what lessons I could extract and apply from its codebase into my own work with agents. As I explored its architecture, digging into everything from memory structure to its integrations with different services, I couldn’t help but notice how similar it felt to us.
My hope with this article is to draw on my observations and offer a new perspective on how the physical world might inform how we think about the future of AI.
Note on OpenClaw
For those unfamiliar with OpenClaw, it’s a portable AI agent built and open-sourced by Peter Steinberger to be your personal assistant. It can run on any computer or server and be configured to autonomously integrate with different services and perform different tasks.
Surprisingly, what’s made OpenClaw stand out hasn’t been any new underlying technological breakthrough but rather how well it has managed to integrate different pieces of what would make an AI system feel, for lack of better words, sentient: Integrations with different systems like email and Telegram, long-term memory to recall information from past conversations, a heartbeat system to respond to different events, and much more.
The result is an AI agent that feels incrementally closer to Samantha from the sci-fi movie Her.
Shared Primitives
During my tinkering with OpenClaw, I noticed several parallels between agents and ourselves. The more I explored its architecture, the more each aspect, from the components that made up the agent to its surrounding environment, started to resemble something familiar. It felt a bit like watching Stranger Things, with a parallel to how The Upside Down exists alongside the real world, except here the physical and digital worlds began to mirror each other in increasingly blurred ways.
It turns out that many of the same primitives that make humans function also appear in agents as well.
Let me explain:
- Thinking: The agent equivalent of a human brain is a large language model (LLM). Its ability to reason comes from being connected to an LLM (in my case, Claude). Whereas humans have brains that start at zero and evolve through experience to guide reasoning, agents rely on LLMs pre-trained on many petabytes of data to guide their decision-making, with part of their reasoning influenced by real-time experiences preserved through memory. It’s not unreasonable to think that in the near future we will have self-training agents that continuously scrape the internet and take sensory feedback from the real world as more data to be trained upon.
- Memory: Beyond reasoning, I noticed something else that was rather interesting - the agent was capable of recalling long-term memories but that capability faded with more conversations and memories; this is an obvious one for those of us deep in the weeds on the technical limitations of LLMs. For all the precision of mathematics and computer science, agents still suffer from memory and context window deficiencies similar to how humans struggle to remember distant memories - heck some of us fail to recall what we ate for breakfast yesterday. At least in today’s world, the more information that goes in, the more the agent struggles with recalling and applying that information in future contexts.
- Food: Just as humans require food and water to survive and function, agents require compute. Every action an agent takes consumes computational resources, along with GPUs, electricity, and ultimately money. Put differently, compute is the fuel that sustains an agent’s ability to think and operate. In a way, it’s a reminder that nothing in life is free, and existence itself has a cost.
- Shelter: The software running an agent lives on some machine that may or may not be exposed to the internet; that machine can be small or large and have qualities of its own just like a home in the physical world. Now you might argue that in computing we are able to containerize and deploy many instances of such “home” housing the agent but for the sake of this analogy let’s just consider the general notion that an agent must exist within some host machine. If such machine is exposed, it be assigned a public IP address that bears some parallel to the human equivalent of a street address.
- Tools and Infrastructure: Just as humans use tools like phones, computers, and cars to communicate, navigate, and interact with the physical world, agents rely on integrations with external systems like email, Telegram, APIs, and other services to communicate, retrieve information, and perform actions beyond their own reasoning engine in the digital world. Much like how humans rely on
At this point, you might think these parallels are obvious, perhaps cheesy, and nothing new. I thought the same thing at first, but the more I sat with them, the more I realized that what it meant was that we could look to the physical world to anticipate what might come next in the digital world as it applies to the current AI movement.
Agents as Independent Actors
Up until now, I, like most people, have always viewed “AI” as a tool or feature within a contained environment used to accomplish a specific objective. For example, AI might exist as an LLM wired into a browser like ChatGPT or Claude to answer questions prompted by its users. It might also appear as a feature inside your email provider, helping you draft emails better and faster, or inside a banking platform as a support agent helping answer help desk tickets for customers. In all of these cases, AI exists within the scope of a particular platform, trained to respond or take action based on specific user input.
But seeing how similar agents were to people, I started to question what the world could look like if AI weren’t merely implemented as a feature within an application or a tool living in someone else’s interface. I started to wonder whether or not agents could co-exist with us, but as their own independent citizens on the internet, with their own (IP) addresses and capabilities to navigate and partake in the internet. For what it’s worth, agents would truly be native citizens, born into the world over the internet.
The questions became “Why not?”, “What could that look like?” and whether or not the internet, in its current form, was ready to accommodate this new citizen.
Agents in Need of Identity
One of the first things assigned to anyone at birth is a name, which becomes the foundation for how you are recognized and referred to by others in the physical world.
In the internet age, humans and servers on the internet follow a separate system, bound by contracts that form what we refer to as digital identity for every person and digital workload. People have email addresses, usernames, and accounts that allow us to be uniquely identified. Meanwhile, web servers and other digital workloads have (digital) certificates that help attest to the fact that when we visit them, we are indeed interacting with the intended target; this is how we know that we’re talking to the real YouTube when you visit the right website. These forms of identity allow us to recognize, communicate, and trust each other such that we can feel assured that we are receiving emails from intended senders or that we are accessing the right website. While most people on the internet take this stuff for granted since it’s all abstracted away, it’s important to recognize that this incredible and secure experience we have on the internet is powered by a lot of cryptography, protocols, and systems designed and refined over the past few decades across both hardware and software.
In today’s extension of the internet age, I believe that we have a new actor on the internet block called the AI agent. I view this as a new class of actor because agents have a novel property, being that they behave non-deterministically, unlike any program or script from the past. You might argue that agents are trained on data and numbers and ultimately that LLMs are complex algorithms that perform inference on inputs, but there’s something uncanny about this black box where we cannot easily predict and guarantee that an LLM will act in a certain way just like how you cannot easily guarantee how someone in the physical world might react to an event.
So why does any of this matter?
Well if we perceive agents to be a new class of actors on the internet, then they ought to have some form of identification in order to partake in it, since the resilience of the internet depends on trust amongst its participants.
Missing Identity, Missing Infrastructure
As mentioned, identity through something like a passport or driver’s license is what enables us to trust and engage with systems around us, whether that be opening a bank account, signing a contract, accessing a corporate building, or making a purchase in the physical world; this is how people know that they are dealing with the right person at any given point in time. Similarly, on the internet, identity serves a similar role, allowing both humans and machines to authenticate, authorize, and ultimately take action through things like email or X.509 certificates.
This, however, breaks down with agents because, as it turns out, there is no agreed-upon definition for agent identity on the internet, and assigning identity to an agent is not as straightforward as you’d think. This becomes increasingly important to align on when you consider multi-agent systems and how agents might interact with different services or websites that, by the way, never intended to be accessed by non-humans, at least not in the way they thought (more on that soon). Instead, you’ll find many outstanding questions to ponder and discuss:
- What element(s) of an agent should be considered in such a definition of agent identity?
- Is it the underlying model, the memory it accumulates over time, the host machine it runs on, or some combination of all three?
- If two LLM sessions are run on a host machine, should that be considered one or two independent identities?
Regardless of how you might answer the above questions, there’s obviously a lot of work to be done in the identity arena, and I’m sure that the right answers will require the Internet Engineering Task Force (IETF), participants of the Internet (both humans and agents), and large companies to work together to come up with an optimal solution.
Beyond identity, agents need to be able to interact with websites and services like Gmail, Slack, or even Salesforce if we intend agents to become colleagues of a sales team; they might even need to pay for services on the internet. Unfortunately, similar to the identity conversation, there’s a lot of ambiguity around the optimal shape and form factor for how such interactions should occur.
It turns out that enabling agents to interact with services (optimally) isn’t as straightforward as you’d think and, while there are interesting engineering developments underway to bridge the gap and make the internet more AI native such as with MCP, I believe this singular protocol is one piece of many more to come, representing a larger structural change that must occur to enable an AI-forward future. Perhaps the dawning moment for me was the realization that most websites on the internet were built for humans, and the existence of CAPTCHAs to prevent “bots,” as we’ve been calling them, from accessing services proves that.
The reality is that the internet, along with the form and factor of its ecosystem, including the browser, was designed for humans. Consider the following:
- How websites optimize for browser experience and not the agent experience; one might even ask whether or not agents should need a browser at all to navigate the internet or if we’re spinning up virtual browsers to compensate for the fact that the internet was not designed for agents.
- How much the disciplines of web design, product design, and UI/UX revolve around optimizing websites and applications for humans.
- How payment over the internet is typically done by humans and involves inputting credit card details from the physical world into the browser.
- How access to services on the internet is often done through API keys bound to users; one might ask why agents should act on behalf of users through these credentials instead of assuming their own “service” accounts with unique credentials on said service.
Overall, it’s clear to me that the internet was not built with agents in mind and both the fabric, the underlying primitives and protocols that power the internet, along with its participants, websites offering different services, will have to change to cater both humans and agents like OpenClaw.
An Opportunistic Future
The key to an opportunistic future is viewing agents through the lens of being independent actors on the internet with their own identities. Once you start doing that, you’ll begin asking many interesting questions: What happens when agents operate across the internet, hold identity, transact, and interact with other systems?
Truth be told, the primitives we rely on today including identity, authentication, authorization, and system interfaces, were designed for humans and deterministic workloads. Extending them to agents means dealing with non-determinism, autonomy, and systems that were never built with this model in mind.
If you’re a builder reading this, it’s worth spending time to think about the structural gaps in current internet infrastructure because this is likely where new systems and opportunities will emerge as agents become first-class participants in the new world.