As we are building more and more real-world agentic technologies from controlled prototypes, security guardrails are fundamental for business use cases. Prototypes built by research scientists and software engineers do not adhere to the security/privacy standards that a scaled product needs when dealing with customers’ personal information like credit card details, address, age, etc.


In traditional software, the boundaries for accessing the data layer, other service API endpoints are all explicit and can be tuned based on resource/action policies. Hence, there are deterministic outcomes when developing the service. However, in an AI agent-driven workflow, the line blurs, and sometimes it's also called “fuzzy-logic”. A single model can summarize sensitive text, access private or internal APIs based on multiple user contexts, without realizing that this information should remain confidential. To achieve complete autonomous agents performing product operations, privacy design is not optional, and becomes the most important feature.


The power of AI agents raises a key question: how do you stop an intelligent system from doing something it is not supposed to do?

The Risks of Autonomy

Most developers worry about hallucinations or poor reasoning in LLMs (Large Language Models). Very few realize that the same behaviors can quietly leak sensitive data. 


For example, in a production system, let’s assume the agentic system helps you book your favorite NBA team’s basketball game ticket. As a software company that makes that application, would you rather fail the ticket purchase due to hallucination but no customer information leak, or a successful purchase at the cost of a data leak?


This can happen in a few ways:

1. Context persistence: Agents often cache prior inputs for coherence. Hence, it is important to manage caches that store private tokens, emails, or identifiers longer than intended.

2. Cross-tenant drift: When an agent is running for multiple customers or domains, it can lead to context overlap that exposes sensitive data between users.

3. Prompt injection and jailbreaks: A prompt attack can manipulate the agent’s input to be revealed or alter the input itself from the intended one. There needs to be guardrails to help the agent differentiate between user prompts and developer guidelines.

4. Third-party dependencies: External APIs used by the agent may log or store payloads without the user’s knowledge.


Principles of Privacy-by-Design 

Similar to the best practices for a distributed software system, we need standards for privacy-based architecture for agentic systems. The most important principles used today:


1. Isolation by Context: Each agent instance should operate in a short-lived sandbox, an independent execution and memory space. This is very similar to any EC2 container in maintaining a consistent environment. Context can be passed explicitly, and not implicitly. When the task ends, memory is destroyed without any persistence duration. 


2. Token-Level Redaction: Before any data reaches the model layer, it must pass through a sanitization filter. The filters include named entity recognition or lightweight regex scanners to identify common industry patterns like credit card numbers, email addresses, or access keys. These can be replaced with placeholders, which do not preserve any meaning or context, thus removing data sensitivity.


Def
    redact_sensitive_data(text):
    Patterns=[r"\b\d{16}\b",r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-       Za-z]{2,}\b"]
    For p in patterns: 

        Text = re.sub(p,"[REDACTED]",text) 

        Return text


Even a simple pre-processor like this dramatically reduces downstream risk.


The above pre-processor function:


3. Ephemeral Data Pipelines: Maintain a configuration for data persistence duration after which any logs, embeddings, and temporary caches will expire automatically. It is a common theme throughout the industry, engineers and scientists retain data for a longer time than needed in order to debug and train models, respectively. A strict configuration helps remove that issue.


4. Policy Enforcement as Code: Instead of having all the privacy logic deep inside the package/modules, it is a good idea to implement it as a dedicated policy engine. The engine evaluates every operation with set rules defined by engineers/scientists or policies defined by the product regarding who can access what, for how long, and under which role.


5. Output Validation: Similar to the input sanitization, an output from an agentic system should also be filtered before exiting to other systems or external public APIs. This is to maintain data/privacy policies


Layered Architecture for Safe Autonomy

A practical way to think about privacy guardrails is as a reverse proxy around your agent. You can visualize them using the following layers:


  1. Input Filter:  Its simply a filter that can mask or reject sensitive data.
  2. Policy Engine: A rule engine that checks user permissions and audit rules.
  3. Execution Sandbox: This is like a containerized environment where the agent runs with scoped, temporary credentials.
  4. This layer can be broken down even further as the LLM layer, planning layer and the act layer as well
  5. Output Filter: Similar to input filter, it redacts or flags responses that contain restricted information.
  6. Observability Layer:  Collects data for monitoring the system


Cascading effect of corruption in Agentic AI Systems

Even with multi-layer filtering in place, we still see data leaks like Salesforce’s Agent Force.

If one layer is compromised by an incorrect configuration, prompt injection, or unexpected API behavior, it can lead to failure that ripples outward and contaminates the entire pipeline.


Effects of a compromised software system 

Layer

If Compromised

System-Wide Effect

Input Filter

Malicious or sensitive data passes undetected

LLM memorizes or exposes confidential tokens

Policy Engine

Elevated permissions 

Unauthorized actions by the agent

LLM

Corrupt input can shift reasoning or output bias

All downstream steps inherit corrupted logic

Orchestrator/ Planner

Workflow routing hijacked

The agent executes incorrect or dangerous sequences

Output Filter

No validation of generated text

Sensitive data or harmful content reaches end users

Observability Layer

Raw data logged persistently

Compromised data re-enters training or analytics loops


As discussed in the above section multiple layers of security provide "defense in depth". But as these layers are sometimes tightly coupled based on shared vector store or session memory, cookies, etc, a compromise at one of these layers can cause unintended access to the rest.

Guardrails as Containment Walls

Privacy guardrails provide a way to limit the blast radius of such a failure. They are similar to circuit breakers that protect other components in the system when one component overheats from a power surge. It is like de-coupling the system at runtime, also known as "andon chord" in the software industry.


A robust design ensures that:



In other words, guardrails don't just prevent bad behavior; they contain it.


Testing Guardrail Effectiveness

Any good software system should have a feedback loop that collects certain metrics to determine if the system is running optimally. It is the same when it comes to the yearly maintenance of your car. The technician will go through the error codes on your car and determine if anything needs to be replaced. Similarly, a good security system needs to be tested very often with real-world data and synthetic data to measure the performance of the guardrails. Bad actors are always looking to compromise software systems, and to stay ahead, one needs to train the guardrails with more data.



The goal of any software engineer is to develop a system that optimizes for both precision and performance.


A Real World Example:


To scale agentic solutions in the real world, engineers need to strike a good balance between deterministic guardrails similar to rule engine and non-deterministic guardrails. For example, if there are terms and conditions being checked while logging in to create a profile, these highly important actions should be protected by both non-deterministic and deterministic guardrails. This two-layered approach can save incorrect action being performed by agent hallucination.


Some of the lessons that I learnt

I learn the following after deploying and scaling an agentic system:


Building trust is the real competitive advantage. Users, regulators, and product teams can all move faster when they know the system is designed to fail safely.

Closing Thoughts

With this shift towards agentic technology, the decision-making layer has been abstracted, and it is no longer just software that follows instructions based on some static set of if/else conditions. This increases the autonomy of each piece of code one writes and can lead to non-deterministic outcomes from the intended code. We could potentially see more software reliability concerns, privacy leaks, both in terms of data and intellectual property. Privacy guardrails, including clear moral and technical boundaries, become very crucial during the design phase of a software architecture. The future of AI will be determined by "trust" of agentic systems at scale with a robust set of security features.


References:

https://arxiv.org/abs/2306.05499

https://arxiv.org/abs/2406.07904

https://arxiv.org/pdf/2405.14478