
How we built an agent to investigate failed tests and find the root cause of bugs.
How we built an agent to investigate failed tests and find the root cause of bugs.
AI engineer
Anyone who's spent time managing automated tests knows the frustration of troubleshooting mysterious test failures. Hours spent sifting through logs, console outputs, network requests, and screenshots can grind productivity to a halt. We believe QA engineers and developers should focus their time on building and enhancing great software - not digging through logs. That’s why we built the Triage Agent: an intelligent assistant designed to pinpoint the exact cause of test failures using advanced large language models (LLMs). These models power most of the analysis, enabling adaptive, context-aware reasoning that mirrors how an experienced QA engineer would approach an investigation - but at lightning speed and scale.
Imagine running your automated tests and immediately knowing not just that a test failed, but exactly why. No more guesswork or digging in logs, just actionable insights.
In this post, we’ll explore how the Triage Agent uncovers the root cause of test failures, providing deeper insights beyond the surface-level error messages.
The Triage Agent compares the failing test run with the previous passing run. The goal of this comparison is to filter out irrelevant error messages or behaviors that also appeared in the successful run. Playwright traces are often cluttered with third-party errors or warnings unrelated to actual test failures, which can mislead automated analysis.
The Triage Agent systematically processes test failure data through a structured, multi-phase pipeline. This pipeline leverages LLMs extensively, making informed decisions at each step based on the gathered evidence.
It begins by collecting extensive data from both successful and failed test executions, including: Test Steps: Heal’s test agent executes a structured version of a test scenario, step by step, helping to pinpoint the exact moment a test fails. The test steps provided to the triage agent are detailed actions performed during the test, along with their corresponding execution status, all formatted in our custom JSON structure.
Playwright traces
Application Logs: Logs from our test agent containing crucial failure details such as assertion failures, missing page elements, or timeouts..
Given the potentially large size of this data (sometimes hundreds of MB), the agent intelligently identifies and analyzes only relevant segments.
Once data collection is complete, the Triage Agent preprocesses the information to clearly understand the test's structure, its objectives, and exactly where it failed:
This phase sets the stage by clearly defining expected vs. actual behaviors for subsequent steps.
Next, the agent enters a dynamic investigative phase. Starting from the failing step, an orchestrator intelligently decides the next best investigative step based on earlier findings, and moves backwards through the test steps if further context is needed. If the root cause becomes clear, the orchestrator terminates early, conserving resources and time.
The orchestrator leverages four specialized tools.
This tool compares the Playwright traces from both the successful and failing test runs at specific test steps. It examines console and network logs around critical actions to isolate errors and messages uniquely associated with the failing run.
The agent filters out noise such as third-party script errors or known non-critical issues that also appeared in the passing run, ensuring that attention is focused exclusively on genuine errors directly related to the test failure. This precise log filtering helps clearly identify the true source of the problem.
This tool compares screenshots taken before and after key actions in both passing and failing test runs. It visually highlights potential user interface issues such as missing elements, unexpected pop-ups, layout shifts, or error messages.
This visual verification step can quickly reveal changes or anomalies that may not be immediately evident through logs or network traces alone.
Heal.dev’s test agent is responsible for executing a test scenario step by step and verifying the status of each step after execution. It logs detailed information about its behavior at every stage. The Application Log Analysis tool compares the application logs from both failing and passing runs of the agent. It looks for grounding errors - issues related to locating elements on a web page based on a description - such as incorrect or outdated selectors, or timing-related problems where the test and the website become unsynchronized. For instance, this could happen if the test tries to click a button before it appears or checks a field before the page has fully loaded. By isolating these test execution logs, the agent quickly identifies if the failure stems from test instability or synchronization issues rather than actual website functionality.
The Stop Tool allows the orchestrator to end the analysis process once sufficient evidence has been gathered to identify the root cause. Rather than spending extra resources analyzing unrelated or redundant data, this tool ensures efficiency by terminating the investigative loop immediately after a diagnosis is obtained.
In the final step, the agent summarizes all gathered evidence into a structured, comprehensive, and actionable conclusion.
This conclusion not only clearly describes the precise root cause of the failure but also offers context and reasoning behind the diagnosis. It details the exact nature of the issue, suggests potential fixes or mitigation strategies, and clearly differentiates between genuine site issues and test-related problems.
The objective is to give the human reviewers the right insights to go fix the issue fast.
Here is an example of the agent tool calling for a very simple case: an assertion text changed in the page.
The agent does not stop at the discrepancy found in the page text, but goes back to the previous step to make sure the page state was the same in both runs.
That agent has allowed Heal to unlock:
Heal.dev is a QA agent that helps engineering teams ship higher quality software, faster. Heal's agent generates and maintains end-to-end tests under the supervision of expert humans. When Heal finds a bug, it automatically investigates the root cause to accelerate remediation (using this Triage Agent!).
So if you want to try Heal.dev's embedded QA engineer, or you're interested in working on state of the art agents at the intersection of browsers, code, and user experience, feel free to reach out!