Cloud dashboards show you the problem.
They don’t solve it.
Every organization running in AWS, Azure, or GCP eventually faces the same issue:
- Idle compute running for months
- Overprovisioned instances
- Orphaned storage
- No clear optimization decisions
- No ownership
What teams need is not another dashboard.
They need an intelligent control plane.
In this article, I’ll walk through how to build a production-ready multi-agent FinOps system powered by:
- FastAPI (backend orchestration)
- LLMs (structured reasoning)
- React (dashboard UI)
- Docker (deployment)
This is implementation-focused. Minimal theory. Real architecture.
Architecture
The Problem: Cost Data Without Decisions
Most FinOps tools stop at:
- Cost visualization
- Alerts
- Basic rule-based recommendations
But real optimization requires reasoning:
Should we downsize this instance?
Is this idle volume safe to delete?
What’s the performance risk?
Static rules are too rigid.
Pure AI is too risky.
The solution: Rule-based triggers + LLM reasoning + human approval.
System Architecture Overview
At a high level:
User → React UI → FastAPI → Agents → LLM → Structured Output → Human Approval
We separate responsibilities clearly:
- UI handles interaction
- API orchestrates
- Agents apply logic
- LLM provides contextual reasoning
- Humans approve execution
This keeps the system enterprise-safe.
The Multi-Agent Design
Instead of one monolithic “AI service”, we use specialized agents.
1. Diagnostic Agent
Detects inefficiencies and optimization opportunities.
2. Idle Cleanup Agent
Identifies unused resources that may be safely removed.
3. Rightsizing Agent
Recommends better instance sizing based on usage trends.
Each agent follows the same pattern:
- Apply deterministic rules
- Construct a structured context
- Call the LLM with constrained instructions
- Validate JSON output
- Return recommendation
This is not chat AI.
This is constrained reasoning.
Backend: FastAPI as the Control Plane
FastAPI acts as the orchestrator.
Example endpoint:
@app.post("/analyze/idle")
def analyze_idle():
data = fetch_cloud_metrics()
result = idle_agent.analyze(data)
return result
Responsibilities:
- Route requests to the correct agent
- Inject telemetry
- Enforce policies
- Log all decisions
- Validate structured responses
The API layer is critical.
It prevents LLM outputs from directly impacting infrastructure.
Inside an Agent
Here’s what a simplified DiagnosticAgent looks like:
class DiagnosticAgent:
def __init__(self, llm_client):
self.llm = llm_client
def analyze(self, resource):
if resource["cpu_avg"] < 5 and resource["days_running"] > 14:
return self._call_llm(resource)
return None
Notice:
We do not send everything to the LLM.
We filter first.
This reduces:
- Cost
- Latency
- Hallucination risk
Constrained LLM Prompting
We never ask open-ended questions.
We use structured prompts:
You are a FinOps optimization engine.
Given:
- cpu_avg: 2%
- monthly_cost: $430
- environment: production
Return:
- recommendation
- risk_level
- estimated_savings
- justification
Output JSON only.
We force:
- Role clarity
- Schema constraints
- Deterministic structure
The output must look like:
{
"recommendation": "Downsize to t3.medium",
"risk_level": "Low",
"estimated_savings": 180,
"justification": "CPU utilization below 5% for 30 days"
}
If parsing fails, we reject it.
Never pass raw model text downstream.
The Idle Cleanup Agent
This agent is more sensitive.
Deletion is high risk.
Example logic:
if resource["attached"] is False and resource["days_idle"] > 30:
flag = True
The LLM is not deciding whether to delete.
It classifies:
- Risk level
- Compliance concern
- Savings estimate
Human approval is mandatory.
The Rightsizing Agent
Rightsizing requires trend awareness.
We analyze:
- Average CPU
- Peak CPU
- Memory utilization
- 30-day stability
Example:
if cpu_avg < 40 and cpu_peak < 60:
candidate = True
The LLM suggests a smaller instance while respecting performance buffers.
Again:
Recommendation, not execution.
React Frontend
The React dashboard shows:
- Optimization opportunity
- Risk level
- Estimated savings
- Confidence score
- Approve/Reject button
This turns AI output into decision support.
Not automation.
Human-in-the-Loop Execution
Execution flow:
Frontend → Backend → Cloud API → Confirm status
Key safeguards:
- No production deletion without approval
- Snapshot before resize
- Post-change monitoring
- Full audit logging
AI assists. Humans decide.
Dockerized Deployment
We containerize:
- FastAPI service
- React frontend
- Optional Redis / Postgres
Example Dockerfile:
FROM python:3.11
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
This allows:
- Reproducible environments
- Cloud-native deployment
- Easy scaling
Production Hardening
This is where most AI projects fail.
Enterprise safeguards include:
- Schema validation on every LLM output
- Observability (log prompts + responses)
- Retry logic with backoff
- Environment restrictions (prod guardrails)
- Role-based access control
- Versioned prompts
- Rate limiting
AI without guardrails is a liability.
AI with structure becomes leverage.
Why This Architecture Works
It balances:
Rules + LLM + Humans
Instead of replacing decision-makers, it augments them.
The LLM:
- Explains
- Quantifies
- Suggests
It does not:
- Execute
- Override policies
- Bypass governance
That separation is what makes this production-ready.
Key Takeaways
- Don’t build AI monoliths — build specialized agents.
- Always filter before calling the LLM.
- Constrain prompts with explicit schemas.
- Validate outputs before using them.
- Keep humans in the loop for infrastructure changes.
- Log everything.
This is how you move from an AI demo to an enterprise system.
Final Thought
FinOps dashboards show cost.
Agentic AI systems generate action.
When designed correctly, multi-agent architectures can transform cloud cost management from reactive reporting to intelligent optimization.
The difference is not in using an LLM.
The difference is in how you architect around it.
Let’s connect 👇
🔗 LinkedIn:
https://www.linkedin.com/in/dhiraj-srivastava-b9211724/
💻 GitHub (Code & Repositories):
https://github.com/dexterous-dev?tab=repositories