sia.hackernoon.com

In the heat of the initial ChatGPT craze, I got a text from a former coworker. He wanted to run an idea by me. Always the one to enjoy brainstorming, we hopped on a call and he started off with “Remember how you used to always ask me to pull data for you? What if you could just do it yourself?” And then he proceeds to pitch me an idea that thousands (tens of thousands?) of other people were thinking at the same time: LLMs could be used for text-to-SQL to help less technical folks answer their own data questions.

I was hooked on the idea, but before diving in head first, I told Lei (now my CTO) that we had to do some validation. We contacted friends and former coworkers from various industries. There was a strong interest in real "self-service analytics." We knew it would be much more complicated than it seemed, but the opportunity felt too good to pass up. So Lei and I left The Shire and embarked on our journey to create our vision: Fabi.ai.

This post isn’t about our product itself (though, if you’re curious, you can read more about how some of the ideas below informed our recent product work here). Instead, I wanted to share the core learnings we’ve collected from working with LLMs for data analysis along our journey.

Note: This journey is woefully lacking in wizards and epic Middle-earth battles. 🧙

Why use AI for self-service analytics?

We won’t linger on the “why” too long. If you’re reading this, you likely fall into one of two groups:

You’re someone who wishes you had self-service analytics available and don’t want to have to always wait on your data team
You’re on the data team and have been hearing about how AI is going to help you solve your ad hoc requests problem.

Ignoring the concern for the role of data analysts and scientists, the idea of an all-knowing AI that can answer any questions about an organization’s data sounds nice. Or at least, it sounds nice for the organization and its business leaders whose creativity for new ways of asking questions knows no bound. This AI could be the solution to creating a “data-driven” organization where every leader is leaning on empirical evidence to make their strategic decisions. And all at a fraction of the cost that it would normally take. Finally! Organizations can capitalize on that “new oil” they’ve been hearing about since 2010.

But if this is such a valuable problem to solve and AI has gotten so good, why has no product actually solved it thus far?

Why AI for self-service analytics has failed thus far

Recent industry surveys paint a complex picture of AI adoption in the enterprise. 61 percent of companies are trying out AI agents. However, many worry about reliability and security. In fact, 21% of organizations don't use them at all. These hesitations are felt particularly strongly within data teams, where accuracy and trustworthiness are mission critical to our ability to do work.

Adopters of AI–especially in the enterprise–have a high bar when it comes to the expectations of the technology. In the context of data analytics and the self-serve dream, we expect our AI tooling to:

Provides insights: Tables and charts are great, but those are a subset of what one might call “insights”. Insights are “Aha!” moments that come from spotting things in your data that run counter to your intuition and would not have otherwise considered. Sometimes a SQL query or a pivot can shine a light on these insights, but generally it feels much more like finding a needle in a haystack.
Work reliably nearly 100% of the time: The only thing worse than no data is bad data. If the AI can’t be trusted or hallucinates answers and data, that spells bad news for everyone. This means that when the AI has data, it should use it correctly. But when it lacks data, it should avoid giving an answer (something LLMs are notoriously bad at).
Be accessible to a wide range of technical skill sets: The beauty of LLMs is that you can interact with them like you would with a coworker over Slack. You can use vague language. The other person or thing can likely understand your request in the business context. Conversely, the more a system requires using exact terms in an exact form, the less accessible it is. This type of system requires training and reinforcement, which we all know, can be challenging.

Sadly, most current solutions use a traditional monolithic AI framework, which often fails to meet expectations. In the past few years, the Fabi.ai team and I worked hard on this issue. We built prototypes for the enterprise and explored many options. In the end, we realized that neither Retrieval Augment Generation (RAG) nor fine-tuning could fix this problem with the current monolithic framework.

When we tested this approach, a few things became clear to us:

RAG is fragile. Too little context and the AI can’t answer a question and risks hallucinating. Too much context and the AI gets confused and loses its accuracy.
One-shot AI doesn’t get you anywhere. The best AI in the world will never be able to accurately pull and analyze data in a single shot. There are simply too many nuances in the data and question. Let’s take the simplest example possible: You have an “Account type” field that’s 95% populated with 10 distinct values. If you ask the AI to filter on a set of account types, it may fail to recognize that there are blank values, thus producing an invalid SQL query. “Sure,” you might say, “but we can simply compute the stats for each field and sample values and store that in our context vector store.” The types of issues are nearly infinite and all unique in their own way.
Enterprise data is messy. This relates to the first two points, but is worth emphasizing. Even if for a fleeting moment, an organization can have a few gold tables with a perfectly defined semantic layer, that all comes crashing down as soon as the RevOps leader decides to adjust the business model. I like to draw the analogy of a house: You can generally keep a house fairly tidy, but there’s always going to be something that needs to be cleaned up or fixed.
Text-to-SQL is too limiting. For most data analytics questions, writing the SQL to pull the data is just the very first step. This is the step you have to take before you can even start asking more interesting questions. SQL simply can’t handle the complex analysis that business users are asking. LLMs and Python on the other hand are perfectly suited for the task. These tools can take your SQL output and find that needle in the haystack. They can also run regression analyses to uncover larger trends.

After looking at these issues, we thought about how to make AI adapt better to problems. That’s when AI agents came into play and solidified this concept for us.

The future: Agent meshes

The minute we laid eyes on agentic frameworks, we knew it would change the game. We suddenly felt we could let the AI decide how to answer questions. It could work through steps and troubleshoot by itself. If the AI writes a SQL query that misses null values in the "Account type" field, it can dry-run the query, spot the error, and fix it itself. But what if we could take this a step further and let the AI operate mostly in Python and leverage LLMs? Now, the AI does more than pull data. It can use Python packages or LLMs to find outliers, trends, or unique insights, which you would usually have to look for manually.

But we still had one problem: the messy enterprise data. We believed organizations could solve this by using strong data engineering practices, like a medallion architecture and a strict semantic layer. However, we rarely found organizations that actually did this in real life. Most organizations use spreadsheets, half-baked tables, and ever-changing data models. From here, we came up with the idea of building specialized AI agents that can be built quickly to answer a specific set of questions.

As companies grow, they handle more data and have more users. The agent mesh idea helps balance quick decision-making with the control needed for governance. Specialized agents help set clear boundaries and responsibilities for each AI. They also create a scalable way for agents to communicate. Plus, they can help manage resources efficiently across teams and companies.

Specialized AI agents

The idea behind a specialized agent is that this agent can and will only answer questions on a very tightly defined dataset. For example, you can create and launch an AI agent that answers questions about marketing campaigns. Or you can build another to answer questions about marketing pipeline, so on and so forth.

We recently launched Agent Analyst, using this architecture. Early signs are very promising. When the datasets are carefully curated and at the right granularity level, these agents can answer a specific set of questions extremely reliably. The builder of these agents can share them with non-technical users and rest easy knowing that the AI won’t answer questions that are out of scope.

There’s just one flaw: Users need to know which agent to go to for which question. It's like needing to know the right marketing analyst to ask a question of vs. just asking a general question. With a general question, someone on the team can direct it to the right person. This is where the concept of an “agent mesh” comes into play.

Connecting agents together

If a single agent can reliably answer domain-specific questions, then why not let agents talk to each other? Why can't, for example, the marketing campaign agent just ask the pipeline agent directly if they can answer a question easier? We believe it should be able to. As a matter of fact, we think that in the future there will be networks of agents with a hierarchical structure. You can picture a “GTM agent” that calls the “Marketing agent.” This agent then calls both the “Pipeline agent” and the “Marketing campaign agent.”

This idea is like a more general idea floating around the AI known as the "Internet of Agents." It's a future where AI agents collaborate smoothly across various organizations. They do this while ensuring that security and trust remain strong.

This mesh approach offers a few key advantages over a monolithic AI (on a pristine semantic layer):

Observability: Since a single agent provides answers based on specific data, you can trace each answer back to that agent. This helps ensure accuracy through auditing. To give a concrete, albeit overly simplistic, example, imagine you have two events tables: one for marketing and one for product. If a user asks, “Which events have generated the most revenue?” the AI might assume they mean product events. Even if it's wrong, the user can see which agent responded and can guide the AI.
Maintainability: Just like a car engine, if you can easily find problems and quickly replace parts, the car becomes more reliable. If one agent starts failing because of a shift in the data model, then can quickly be spotted and that agent can be updated.
Accuracy: With each agent operating within its own confines, there’s no room for it to go off the rails. It may not have the answer, but it won’t make something fanciful.

At the end of the day, this idea of a mesh isn’t novel. This mirrors the concept of a mixture of experts which has been shown to improve accuracy for LLMs. It’s simply taking that same idea and bringing it to AI agents.

Technical challenges of agent meshes

At Fabi.ai, we have a long way to go as we build an Analyst Agent mesh. But, we’ve already overcome some of the big technical infrastructure challenges along the way.

AI data analyst agents need a unique architecture. This design must allow them to use Python or LLMs to answer questions, stay in sync with data sources, and fit into collaborative platforms, while still staying secure and scalable. Each agent needs to operate in its own Python kernel, which needs to quickly be spun up or down to reduce costs and stay in sync with the source data.

Architectures that don’t provide individual kernels to each agent can run into one of the following risks:

State conflict for variables between AI agents. Two separate agents may generate a “foo” variable to answer a question, causing a clash. There could be other ways to generate unique identifiers, but these increase the odds of AI generating invalid code.
Security risks caused by the sharing of data between different teams or even different organizations.
Performance bottlenecks if one agent takes up a disproportionate amount of compute resources.

The challenge of building this type of platform is as much an AI challenge as it is a DevOps challenge.

Looking ahead: Embracing specialized, governed AI agents in data

As enterprise companies manage more AI applications in their operations, they need specialized and well-governed approaches. The agent mesh framework uses specialized AI data agents as a means for scaling AI in data analytics. This approach keeps security, reliability, and performance intact.

We might have expected AI to be everywhere by now, answering most data questions. But, if we look closely, the progress in just two years since ChatGPT launched is impressive. We still have much to learn on this journey. In my mind, however, agents and agent mesh frameworks will be key to enterprise AI.