Tudor B.

We automated a project with AI. Every test passed. Nothing worked.

THE INCIDENT

Every Test Passed. Nothing Worked.

We automated an entire project using Playwright and Claude. The AI generated the tests, ran them, and reported that everything passed. Green across the board. Then we opened the application and started clicking. The login flow was broken. A payment form submitted empty data. Navigation links pointed to pages that did not exist.

The AI had tested its own assumptions about how the application should behave. It never tested how a real person would actually use it.

"The idea with QA is to ensure the product works well for the functionality that was designed for the people who will use it. These people won't use Playwright and MCP to use this project - they'll use the buttons, the UI, the flows, the functionality."

Tudor Brad, Founder, BetterQA

RELATED RESOURCES

10 reasons AI can't replace human testers → Agentic QA - the future of testing → Professional QA services → Test reporting tools for QA strategy →

SECTION 01 — THE PROBLEM

5 Things AI Test Automation Gets Wrong

AI can generate hundreds of test cases in minutes. The problem is not speed. The problem is that AI tests what it thinks matters, not what actually matters to users.

01

It Tests DOM, Not Behavior

AI looks at the page structure and generates tests based on what elements exist. It clicks buttons because they are there, not because a user would click them in that order. A "Submit" button that fires successfully at the DOM level can still fail at the UX level if the form validation is broken.

02

It Cannot Judge Visual Quality

A truncated label, an overlapping element, text that overflows its container on mobile - these are visible to a person in under a second. AI sees that the element rendered and calls it a pass. The user sees a broken page.

03

It Generates Happy Path Tests

AI defaults to valid inputs and expected flows. It fills in a form correctly and verifies it submits. It does not try submitting with a missing field, an emoji in the phone number, or clicking "Back" halfway through a checkout. Real users do all of these things.

04

It Hallucinates Assertions

Ask AI to verify a page loaded and it might assert that an element exists - even when that element contains wrong data. We have seen AI-generated tests that verified a login page loaded successfully while the login itself was returning a 500 error. The test passed because the error page had a DOM.

05

It Has No Project Memory

A QA engineer who has worked on a project for 6 months knows where the bugs hide. They know the payment module breaks when a discount code and a gift card are applied together. AI starts fresh every time. It has no context about what has failed before or where risk concentrates.

SECTION 02 — COMPARISON

Raw AI Automation vs. Human-Guided AI

The question is not whether to use AI in testing. It is whether AI runs the process or supports the humans who run it.

Dimension	Raw AI ("just prompt it")	Human-Guided AI (Agentic QA)
Test source	AI reads DOM and generates tests	Tests start from real bugs, requirements, and user flows
Context	None - fresh start every run	Project history, past bugs, known risk areas
Edge cases	Happy path only	QA engineers define edge cases, AI executes them
Visual bugs	Invisible - DOM says it rendered	AI flags visual anomalies, humans verify
Maintenance	Tests break when UI changes - regenerate everything	Self-healing locators adapt to UI changes automatically
Trust level	Low - "all green" means nothing	High - human validates before release
Cost per bug found	Cheap to run, expensive when bugs escape to production	Higher test cost, far lower production incident cost

SECTION 03 — THE ALTERNATIVE

How Agentic QA Actually Works

Agentic QA does not replace the human tester. It gives the human tester AI-powered tools that handle the repetitive work while keeping humans in control of the decisions that matter.

01

Start From Real Defects

Upload a screenshot or a log to BugBoard. AI analyzes the context and generates a detailed bug report. You review it, approve it, and push it to Jira or keep it in BugBoard. No manual copy-paste between tools.

02

Generate Tests From Bugs

The AI test designer reads your project's bug history and creates test cases based on what has actually gone wrong - not what it imagines might go wrong. You review the titles, remove duplicates, and approve before they are added to the test suite.

03

Record, Don't Generate

Flows records what a real person does in the browser - actual clicks, actual navigation, actual form submissions. Then it replays that exact flow. If the UI changes, self-healing locators find the new selectors automatically instead of breaking.

04

AI Refines, Humans Decide

Use AI to find duplicate test cases, suggest edge cases you might have missed, or improve test steps that are too vague. The AI augments your test suite - it does not own it. Every change is reviewed by a QA engineer before it goes live.

05

Export to Any Framework

Tests recorded in Flows export to Selenium, Playwright, or Cypress. Run them in CI/CD, in Git, or through the BugBoard UI. The test data stays in your pipeline, not locked in a proprietary tool.

06

Report With Context

Test execution reports show what passed, what failed, execution time, and bug details - all linked back to the original requirements. An AI summary gives stakeholders a quick read on project health without digging through raw data.

SECTION 04 — ENTERPRISE PERSPECTIVE

What Enterprise Teams Are Learning

In conversations with QA leaders at enterprise organizations - teams of 100+ engineers with 4-5 parallel projects - the same pattern keeps emerging.

OBSERVATION

AI Quality Gates Still Need Humans

Large banks and enterprises are building AI-powered quality gates into their CI/CD pipelines - automated checks that decide whether a build can proceed. But the final approval still comes from a human. The AI proposes, the human disposes.

OBSERVATION

Compliance Requires Human Judgment

When it comes to regulatory compliance - GDPR, FDA Part 11, EU AI Act - nobody wants to let a GPT decide what requirements apply. AI can suggest which compliance checks are relevant, but a human must validate that the checklist is correct.

OBSERVATION

Testing Is Converging With Development

Modern QA looks more like development - Playwright, Visual Studio Code, MCP servers. But the goal is different. Developers build features. QA validates that those features work for the people who use them. Same tools, fundamentally different perspective.

SECTION 05 — THE PIPELINE

BetterQA's Agentic QA Pipeline

AI handles the boring parts. Humans handle the decisions. Every step has a checkpoint where a person reviews before moving forward.

01

Tester Finds a Bug

Screenshot, log, or screen recording uploaded to BugBoard

HUMAN ACTION

02

AI Analyzes and Generates Bug Report

AI reads project context, identifies issues in the screenshot, creates structured report

AI ACTION

03

Tester Reviews and Approves

Verify AI's analysis is correct, push to Jira or BugBoard

HUMAN CHECKPOINT

04

AI Generates Test Cases From Bug History

Using accumulated project context, AI creates test cases that cover the bug and related scenarios

AI ACTION

05

Tester Reviews Test Cases

Check titles, remove duplicates, approve before adding to test suite

HUMAN CHECKPOINT

06

Automated Execution With Self-Healing

Tests run in CI/CD. When locators break, self-healing finds alternative selectors instead of failing

AI ACTION

07

QA Lead Reviews Results and Signs Off

Human makes the go/no-go decision. AI provides the data, humans provide the judgment.

HUMAN DECISION

SECTION 06 — FAQ

Frequently Asked Questions

Is QA automation dead?

No. QA automation is more important than ever. What is dying is the idea that you can replace a QA team with a prompt. AI makes automation faster and more accessible, but the fundamentals have not changed: you need to know what to test, how to validate results, and when to ship.

Can AI replace manual testing entirely?

AI can automate repetitive regression testing effectively. It cannot replace exploratory testing, usability validation, or the judgment required to decide whether a feature actually works for end users. The best results come from combining AI automation with targeted manual testing on high-risk areas.

What is agentic QA?

Agentic QA uses AI agents in testing pipelines to automate repetitive tasks - bug report creation, test case generation, test maintenance, and reporting. The key difference from pure AI automation is that humans stay in the loop at every decision point. AI proposes, humans validate. Learn more about agentic QA pipelines.

How much time does AI save in QA?

From our experience with 50+ engineers: bug reporting goes from 10-15 minutes to under 5 minutes. Test case creation drops from hours of manual writing to minutes of AI generation plus review. Test maintenance with self-healing locators reduces broken test fixes by 60-70%. The time savings are real, but only when AI is used as a tool, not as a replacement.

What tools does BetterQA use for agentic QA?

BugBoard for AI-powered bug reporting and test case management. Flows for browser automation recording with self-healing. BetterFlow for transparent time tracking. Auditi for compliance auditing. All tools are built in-house and included with our QA services.

Should I use Playwright MCP or a dedicated QA tool?

Playwright MCP is powerful for developers who want to generate tests quickly. But it generates tests based on DOM structure, not user intent. For QA purposes, you need tools that start from real user flows and real defects, then validate from the user's perspective - not from the code's perspective. The two approaches complement each other.

Stop Trusting Green Checkmarks

Our 50+ engineers use AI to move faster, not to replace judgment. BugBoard, Flows, and human oversight - included with every engagement. ISO 9001 certified.

BOOK A CONSULTATION

Sources

Need help with software testing?

BetterQA provides independent QA services with 50+ engineers across manual testing, automation, security audits, and performance testing.

Explore our services Get in touch

Share the Post: