✦Your report stops at data. QAI starts there.✦

AI-powered CI failure analysis

Your CI tells you what failed.QAI tells you why.

Engineering teams waste hours every week debugging CI failures. QAI cuts that time by giving you root cause, history, and a fix — directly on the pull request.

Get started free →Try it live — no account →★ View on GitHub

No credit card required.

Not ready to connect your real pipeline? Fork useqai/demo-shop →

github.com · PR #28 · QAI Agent

QAI Agent · Test Intelligence

17 tests · main ← feature/checkout-redesign

High Risk · Do not merge — investigate failures first

9 passed8 failed3 clusters

cart.spec.ts › added item appears in cartRCA · UI changed 85%

✦ AI Analysis

The price locator matches multiple elements (paragraph and span both showing $54.95), causing a strict mode violation. Use getByRole or filter by font-semibold.

200 · 0ms

→

/cart

200 · 12ms

→

POST /api/cart

201 · 38ms

→

GET /api/price

500 · 204ms

→

✕ Test failed

strict mode

✦ AI Fix Suggestion

- page.locator('text=$54.95')+ page.getByRole('paragraph')+ .filter(({ hasText: '$54.95' }))

✦ Ask QAI

@qai-agent why did this test fail?

QAI The price element is duplicated in the DOM — a paragraph and a span both render $54.95. Playwright's strict mode rejects ambiguous locators.

⚡

One step to add to any CI workflow · 60 seconds to first result.

uses: useqai/qai-agent@v1

// How it works

From CI failure to merge decision.

QAI runs after your tests. No setup beyond one workflow step.

01 — Ingests

CI fails. QAI reads.

Runs on every test suite. Playwright, Jest, pytest, Maven, Go — anything emitting JUnit XML.

71ea1f369 fail

b31514c23 fail

8d64ccce1 fail

02 — Clusters

Groups by root cause.

UI_CHANGED · TIMING_FLAKE · BACKEND_ERROR · TEST_BUG — with confidence scores from Playwright traces.

03 — Verdicts

Posts on the PR.

Risk score · merge recommendation · fix suggestion. In the place reviewers already look. No new tab.

PR #28High Risk

Do not merge

8 / 17 failed · 4 clusters

✦ Fix suggestion readyView →

// Features

Everything your team needs to ship confidently.

Clustering & root cause

30 failing tests doesn't mean 30 problems. QAI groups failures by root cause, then explains exactly why — with confidence scores from Playwright traces.

checkout.spec.ts × 4TIMING_FLAKE

product-catalog.spec.tsUI_CHANGED

cart.spec.ts × 2TEST_BUG

✦ AI Analysis

beforeEach times out — checkout page not ready before assertions run.

TIMING_FLAKE75% confidence

AI fix suggestions

Exact code fix for each failing test — posted directly on the PR. Full context from traces, not generic advice.

checkout.spec.ts › beforeEach · 70% confidence

test.beforeEach(async ({ page }) => {
-  await page.goto('/checkout');
+  await page.goto('/checkout');
+  await page.waitForLoadState('networkidle');
+  await expect(page.getByPlaceholder(/DEM010/i))
+    .toBeVisible();
});

Trends & flakiness

See fail rate over time, flakiness leaderboard, cluster history. Know what's been broken for 3 weeks before your standup.

Fail rate per run (last 30)

Mar 28↗ ImprovingApr 25

Slack alerts

High-risk PRs fire an alert to your team channel instantly. Risk score, failure count, cluster breakdown, dashboard link.

QAI Agent APP

🔴 High Risk PR detected
PR #78 · Risk: high (0.60)
20/70 tests failed · 4 clusters
Rec: Do not merge — investigate first
View in QAI Platform →

Ask QAI in your PR

Comment @qai-agent on any PR. Get historical context, severity, frequency data, and fix suggestions — right where you work.

@qai-agent is there an increase in failures vs previous PRs?

Yes — dramatic increase. Compared to previous PRs:

11 failed tests this run
10 unique failure patterns · 20 occurrences each
All failures first seen today — not pre-existing
0% flaky score → consistent, reproducible

Automated rules

Define conditions — occurrences, fail rate, flakiness score, RCA cause — and fire any action automatically when a cluster matches.

When cluster_occurrences ≥ 5 AND TIMING_FLAKE

🐙 GitHub issue→ sent/opened

💬 Slack message→ sent/opened

📧 Email→ sent/opened

🔗 Webhook→ fired

// Coverage

Do you know which parts of your app you actually trust?

Not which lines ran — which pages your tests visited, which API calls they made, which paths they actually walked. QAI maps it automatically from Playwright traces.

Coverage Tree

Every page, every path — green vs red. See exactly which flows your tests walk end to end.

Confidence Map

Cell size = test volume, color = risk level. Spot coverage gaps and danger zones at a glance.

Zero instrumentation

Parsed from Playwright traces automatically. Upload and see results — no code changes, no config.

561 unique paths77% behavioral coveragePlaywright only

See behavioral coverage →

// ROI Calculator

How much is broken CI costing you?

Developers on your team

3 devs

CI runs per week

20 runs

Minutes debugging per failure

20m

Average developer hourly rate

$100/hr

* Based on 60% reduction in debugging time. Adjust the sliders to match your team.

Time wasted on CI

26h / mo

Cost of wasted time

$2,600/mo

QAI cost

$29/mo

Pays for itself in

0.6 days

Estimated ROI with QAI

54×

Estimated monthly savings: $1,560/mo

Start saving — free for 7 days →

// Ask QAI

Your PR has a new reviewer.

Ask anything about your failures. QAI answers using your full history — not just the current run.

📊 Historical context

“How long has this been failing?” — First seen March 18th. 11 occurrences. Never flaky — consistently broken.

📉 Trend comparison

“Is this worse than previous PRs?” — Previous: 7 occurrences. This PR: 20. Nearly 3× increase. Breaking change introduced.

🧰 Fix on demand

“What's the fastest fix?” — Exact code generated. One click to apply. No dashboard required.

github.com · PR #31 · comment thread

@tomerli · just now

@qai-agent is there an increase in failures comparing to previous PRs?

QAI Agent · bot · 4s

Yes, there's a dramatic increase in failures compared to previous PRs.

Previous PRs: 7 occurrences of these patterns.
This PR: 20 occurrences each — nearly 3× increase.

→ UI_CHANGED failures (4 tests) — 85% confidence
→ TIMING_FLAKE failures (4 tests) — 70% confidence
→ 0% flaky score — consistent, reproducible failures

Verdict: Block merge — systematic failures introduced by this PR.

// Get started

Ship green. Stay green.

Drop QAI into any GitHub Actions workflow. The next failed run gets clustered, explained, and resolved before standup.

# Add after your test step
- name: QAI Agent
  uses: useqai/qai-agent@v1
  if: always()
  with:
    junit-path: 'test-results/results.xml'

Create free account →View on GitHub

— or —

Not ready to connect your real pipeline? Fork demo-shop → QAI is already wired up. See what it does first.