Fast test failure analysis with Heal’s Triage agent

AI engineer

June 9, 2025 · 8 min read

Anyone who's spent time managing automated tests knows the frustration of troubleshooting mysterious test failures. Hours spent sifting through logs, console outputs, network requests, and screenshots can grind productivity to a halt. We believe QA engineers and developers should focus their time on building and enhancing great software - not digging through logs. That’s why we built the Triage Agent: an intelligent assistant designed to pinpoint the exact cause of test failures using advanced large language models (LLMs). These models power most of the analysis, enabling adaptive, context-aware reasoning that mirrors how an experienced QA engineer would approach an investigation - but at lightning speed and scale.

Imagine running your automated tests and immediately knowing not just that a test failed, but exactly why. No more guesswork or digging in logs, just actionable insights.

In this post, we’ll explore how the Triage Agent uncovers the root cause of test failures, providing deeper insights beyond the surface-level error messages.

How Does the Root Cause Analysis Work?

The Triage Agent compares the failing test run with the previous passing run. The goal of this comparison is to filter out irrelevant error messages or behaviors that also appeared in the successful run. Playwright traces are often cluttered with third-party errors or warnings unrelated to actual test failures, which can mislead automated analysis.

Overview

The Triage Agent systematically processes test failure data through a structured, multi-phase pipeline. This pipeline leverages LLMs extensively, making informed decisions at each step based on the gathered evidence.

Triage agent pipeline

Step 1: Gathering Comprehensive Evidence

It begins by collecting extensive data from both successful and failed test executions, including: Test Steps: Heal’s test agent executes a structured version of a test scenario, step by step, helping to pinpoint the exact moment a test fails. The test steps provided to the triage agent are detailed actions performed during the test, along with their corresponding execution status, all formatted in our custom JSON structure.

Playwright traces

Console Traces: Complete browser logs, including errors, warnings, and standard messages.
Network Traces: All HTTP/HTTPS requests and responses occurring during test execution.
Screenshots: Visual snapshots captured at critical points of execution.

Application Logs: Logs from our test agent containing crucial failure details such as assertion failures, missing page elements, or timeouts..

Given the potentially large size of this data (sometimes hundreds of MB), the agent intelligently identifies and analyzes only relevant segments.

Step 2: Scenario analysis

Once data collection is complete, the Triage Agent preprocesses the information to clearly understand the test's structure, its objectives, and exactly where it failed:

Step Analysis: Identifies divergence points by comparing step-by-step actions between passing and failing tests, immediately highlighting behavioral differences.
Scenario Summary: Generates a concise, functional overview of the test scenario to guide subsequent analysis.

This phase sets the stage by clearly defining expected vs. actual behaviors for subsequent steps.

Step 3: Orchestrator

Next, the agent enters a dynamic investigative phase. Starting from the failing step, an orchestrator intelligently decides the next best investigative step based on earlier findings, and moves backwards through the test steps if further context is needed. If the root cause becomes clear, the orchestrator terminates early, conserving resources and time.

The orchestrator leverages four specialized tools.

Console & Network Log analysis

This tool compares the Playwright traces from both the successful and failing test runs at specific test steps. It examines console and network logs around critical actions to isolate errors and messages uniquely associated with the failing run.

The agent filters out noise such as third-party script errors or known non-critical issues that also appeared in the passing run, ensuring that attention is focused exclusively on genuine errors directly related to the test failure. This precise log filtering helps clearly identify the true source of the problem.

Analizing traces

Screenshots comparison

This tool compares screenshots taken before and after key actions in both passing and failing test runs. It visually highlights potential user interface issues such as missing elements, unexpected pop-ups, layout shifts, or error messages.

This visual verification step can quickly reveal changes or anomalies that may not be immediately evident through logs or network traces alone.

Analizing screenshots

Application Log analysis

Heal.dev’s test agent is responsible for executing a test scenario step by step and verifying the status of each step after execution. It logs detailed information about its behavior at every stage. The Application Log Analysis tool compares the application logs from both failing and passing runs of the agent. It looks for grounding errors - issues related to locating elements on a web page based on a description - such as incorrect or outdated selectors, or timing-related problems where the test and the website become unsynchronized. For instance, this could happen if the test tries to click a button before it appears or checks a field before the page has fully loaded. By isolating these test execution logs, the agent quickly identifies if the failure stems from test instability or synchronization issues rather than actual website functionality.

Stop tool

The Stop Tool allows the orchestrator to end the analysis process once sufficient evidence has been gathered to identify the root cause. Rather than spending extra resources analyzing unrelated or redundant data, this tool ensures efficiency by terminating the investigative loop immediately after a diagnosis is obtained.

Stop tool

Step 4: Conclusion

In the final step, the agent summarizes all gathered evidence into a structured, comprehensive, and actionable conclusion.

This conclusion not only clearly describes the precise root cause of the failure but also offers context and reasoning behind the diagnosis. It details the exact nature of the issue, suggests potential fixes or mitigation strategies, and clearly differentiates between genuine site issues and test-related problems.

The objective is to give the human reviewers the right insights to go fix the issue fast.

Conclusion

Example scenario

Here is an example of the agent tool calling for a very simple case: an assertion text changed in the page.

Triage agent orchestrator ]

The agent does not stop at the discrepancy found in the page text, but goes back to the previous step to make sure the page state was the same in both runs.

Why Our Approach Stands Out

Deep Comparative Analysis: By directly comparing successful and failed runs, our Triage Agent highlights precisely what has changed or gone wrong. This approach drastically reduces false positives and separates meaningful issues from irrelevant noise.
Multi-modal Data Handling: The integration of visual data (screenshots) and textual information (console, network and application logs) allows the agent to detect subtle differences and pinpoint problems that a single modality might miss. This perspective significantly enhances diagnostic accuracy.
Intelligent, Adaptive Reasoning: Powered by advanced LLMs, our agent dynamically adjusts its investigative path based on initial findings. This adaptability allows the Triage Agent to intelligently focus on the most promising leads first, significantly minimizing manual oversight and rapidly identifying complex issues.
Easily Extendable Framework: Built on a modular architecture, our agent supports straightforward integration of additional analysis tools. This design ensures that it remains future-proof, able to evolve alongside emerging testing scenarios and technologies without extensive re-engineering.

Real-World Impact

That agent has allowed Heal to unlock:

Faster Issue Detection: Analyses complete within a minute, where a typical QA engineer might take more than 10 minutes - accelerating the path from failure to actionable insight and reducing downtime.
Improved Accuracy: Automated triage reduces errors common in manual log reviews, ensuring consistent and systematic issue classification.
Higher Productivity: Less time spent on manual debugging allows teams to focus on tasks like improving test coverage and validating new features.
Enhanced Collaboration : Structured outputs make it easier for devs to align on issues and resolve them quickly.

What we do at Heal

Heal.dev is a QA agent that helps engineering teams ship higher quality software, faster. Heal's agent generates and maintains end-to-end tests under the supervision of expert humans. When Heal finds a bug, it automatically investigates the root cause to accelerate remediation (using this Triage Agent!).

So if you want to try Heal.dev's embedded QA engineer, or you're interested in working on state of the art agents at the intersection of browsers, code, and user experience, feel free to reach out!

How Does the Root Cause Analysis Work?​

Overview​

Step 1: Gathering Comprehensive Evidence​

Step 2: Scenario analysis​

Step 3: Orchestrator​

Console & Network Log analysis​

Screenshots comparison​

Application Log analysis​

Stop tool​

Step 4: Conclusion​

Example scenario​

Why Our Approach Stands Out​

Real-World Impact​

What we do at Heal​

How Does the Root Cause Analysis Work?

Overview

Step 1: Gathering Comprehensive Evidence

Step 2: Scenario analysis

Step 3: Orchestrator

Console & Network Log analysis

Screenshots comparison

Application Log analysis

Stop tool

Step 4: Conclusion

Example scenario

Why Our Approach Stands Out

Real-World Impact

What we do at Heal