Riddle me this: if your test suite breaks every time a button moves, a div changes, or – gods forbid – an A/B test runs, is it really testing anything?
If you had to pause, chances are your team – like most engineering teams – is spending far too much time fixing and maintaining script-based tests that break with every UI change.
Luckily, with AI in the workflow, releases cycles winded down, UI does't break, and everything is fine, right? Right?
In this post, we’ll break down what AI testing is and how QA agents work under the hood – so you can decide if it’s actually any help for your team.
What Traditional Test Automation Does
Test automation tools like Playwright or Selenium follow a set of step-by-step directions. It goes something like: go to this URL, find the element with the specific CSS selector, click it, assert this text appears… It all works great, as long as your product never changes.
But there’s something called the selector treadmill. Basically, scripts don’t break just because the UI changes but also because the instructions go stale. Teams report spending 30%-40% of their dev time maintaining existing tests rather than finding and fixing real bugs or building new features. AI code generation has only increased the friction to maintain good tests.
Tools like Cursor, Claude Code, and Copilot are helping companies ship code faster than ever. But more output also means more UI changes per sprint, more code refactors, more components being rewritten… Each and every one of those brings a risk of your testing workflow breaking.
According to a Forrester TEI study on automation platforms, high-performing teams without a proper QA solution were experiencing around 20 bugs per sprint reaching production within a two-week cycle. As code volume continues to grow, this number increases exponentially, and neither manual testing nor classic test automation is capable of catching up.
The unfortunate reality is that traditional test automation does deliver excellent return on investment (ROI) when things are stable (the same Forrester study reports a 209% ROI over three years for one of these platforms). But that assumes a pre-AI level of development stability that doesn’t exist any more. Instead of helping, scripted tests quickly become liabilities. They start slowing you down, because keeping them up to date becomes a job in itself.
Enter agentic testing.
What Is Agentic Testing?
Simply put, agentic testing focuses on AI achieving given goal. You don’t tell the tool how to test (click this, assert, then that), you tell it what to verify.
Here’s an example statement: “Make sure the user can successfully add an M-size hoodie to the cart and complete checkout with GooglePay”.
The AI agent is tasked with carrying out that goal.
With agentic testing, the QA agent receives a goal from the user and then figures out how to complete it inside the system. It navigates across web, desktop, or mobile apps and interacts with elements. Then, it checks whether the goal was actually achieved.
ReAct Pattern
Most agentic systems, including the one we’ve built at QA.tech, follow the ReAct pattern:
observe —> decide —> act —> evaluate
- Observe: The QA agent first looks at the current state of the page, both the DOM and the visual layout.
- Decide or think: It reasons about the goal. “I need to find the ‘Add to Cart’ button. I see a blue button with a cart icon.”
- Act: It acts like a user would.
- Evaluate: It checks if the action worked and decides what to do next (repeats).
Here, we’re talking about an autonomous system that understands the structural and visual hierarchy of a web app, and recreates a path that a user would take.
Memory Layer
QA.tech creates a structural understanding of your application before running a single test. Our agents crawl your website or an app to map all the pages, flows, interactive elements, and relationships between elements into a knowledge graph.
Let’s use our map analogy once again. Think about the difference between a tourist wandering a city and a local who knows every street by heart. Both can go from A to B, but because of their familiarity with the city, the local knows how to take shortcuts and where the dead ends are.
How Agentic Testing Works in Practice
Before you write a single test, the agent crawls your app. QA.tech calls this epistemic foraging, which is basically agents autonomously exploring your app to map and understand the user flows and the UI elements.
Intent
You write a simple and natural prompt to tell the agent what the end goal is, such as “Verify that the user can successfully search, browse, view, and book a property.”
Discovery
The agent then loads the application using the structure it has already learned during the crawl. Now that it has a map of the pages and elements, it can start looking for the homepage, browser properties feature, booking buttons, and every other required option the way a human user would.
Execution Flow
The agent proceeds to complete the test run step by step (you can watch the full process through a recording session). If something happens unexpectedly, like an unwanted suggestion, the agent sees it, realizes it’s an obstacle to the goal, and closes it.
Assertion
Once the flow is completed, the agent evaluates the pass or fail based on whether the goal has been met.
Now, compare this result to the equivalent Playwright script. You have to go to the /home page, locate the property search input using a specific selector or data-testid, enter a location, trigger the search, wait for the results page to load, click on a property listing, and then find and press the booking button. Finally, you need to check if the confirmation page or message shows up.
It all works, as long as nothing changes. The moment a CAPTCHA is added, a test ID is renamed, or the booking process is split across additional pages, you will be back in maintenance mode.
What Makes Agentic Test Automation Different from AI-Assisted Testing
Honestly, a loot of tools marketed as AI test automation are really just wrappers generating scripts for obsolete tech like Playwright. True, they get you higher test coverage faster, but they are brittle and unreliable in the same way.
We think using agentic automation this way is just using AI on wrong paradigm – the outcome of using AI wrappers is worse when compared to AI agents who act independently. When evaluating AI testing tools, always look for these five markers:
- Goal-driven: Tests are focused on outcomes rather than the implementation process and how you can get there.
- Perceptual: The agent views your application just like a real user would (visually + via HTML). It doesn’t rely on selectors created by an individual to reference an element. This is why your agentic tests won’t break due to UI modifications.
- Adaptive: This is your agent’s ability to self-heal. If you move the position or a “Submit” button, or add an additional step as an A/B test, the agent will be able to find its way to complete the goal, even though the original elements moved, or the path changed.
- Self-evaluating: In agentic testing, the agent determines pass or fail based on whether the stated goal was achieved. Tests stay aligned with user intent even as the codebase evolves underneath.
- Continuously learning: The more interaction the agent has with your application, the better it becomes at recognizing happy path scenarios and what is considered “normal” task performance for specific user interface components.
When Agentic Testing Is (and Isn’t) the Right Fit
It’d be easy to oversell this, so let me be straight. I don’t think you should throw out every single script written for Playwright you own tomorrow.
There are some areas where this approach is a really strong fit:
- End-to-end (E2E) user flow: Anything that involves onboarding, checkout, managing accounts, and doing all CRUD activities;
- Regression suites: Continuously changing UIs that release faster than you can manually test;
- Fast-moving UIs: Validation of new releases on time.
- Complex products: products where the best way to validate the experience was manual testing, but to obvious reasons, it doesn’t happen fast enough.
However, at the moment, it’s a less-than-ideal solution for:
- Highly interactive apps, like Notion. Webgl games or webgl-based UI are also hard for agents to test.
- UI that is highly dynamic from session-to-session.
The reality is that most teams we talk to use a hybrid approach. They rely on scripts for small details and let AI agents handle the broad and complex user flows.
That being said, we’ve seen companies adopting agentic QA gain up to 529% ROI with a 3-month payback.
Wrapping Up
Agentic testing represents a completely different approach from the traditional testing framework, with a one-to-one relationship between your desired goals and your actual results. There are no brittle tests in between that could collapse when your development team releases a product at lightning speed.
If your team is spending more time maintaining test infrastructure than finding real bugs, agentic test automation can help you close that gap.
Want to go deeper? Here are some useful materials:
- From Manual to Autonomous QA: A Step-by-Step Transition Guide
- Is QA Automation Worth It? The Real ROI of Intelligent Testing
Book a demo with QA.tech and see how our agents can validate your critical flows for your next release.