Tudor B.

The 5 AI testing anti-patterns we see in every client engagement

Q: What is test automation and why is it important?

Test automation uses software tools to execute tests automatically, reducing manual effort and human error. It enables faster feedback cycles, more consistent results, and allows teams to run thousands of tests that would be impractical manually. Organizations typically see 40-70% reduction in testing time after implementing automation.

Q: When should you automate tests versus test manually?

Automate tests that are repetitive, require multiple data sets, or need to run frequently such as regression tests. Keep manual testing for exploratory testing, usability evaluation, and one-time scenarios. A good rule: if you will run a test more than twice, consider automating it.

Q: How does BetterQA approach test automation?

BetterQA uses Flows, our proprietary automation tool with AI self-healing capabilities. When elements change, Flows automatically updates selectors instead of failing. We export tests to standard frameworks like Cypress, Playwright, and Selenium so clients own their automation assets. Our 50+ engineers have delivered automation for healthcare, fintech, and enterprise clients.

AI-generated tests with single assertions, self-healing that masks real bugs, 80% coverage that tests nothing meaningful. Five patterns we see in every client codebase and how to fix them.

The pattern

Every new client engagement starts the same way. We onboard, get access to the codebase, open the test suite, and find the same five problems.

These aren’t edge cases. They show up in startups and enterprises, in teams using Playwright and teams using Cypress, in codebases maintained by two developers and codebases maintained by two hundred. They’re the predictable result of letting AI generate tests without enough human oversight.

At BetterQA, we’ve seen these patterns across dozens of engagements in the last year. Here’s what they look like, why they’re dangerous, and what to do instead.

Anti-pattern 1: AI-generated tests with single assertions

What it looks like:

The team uses an AI coding assistant to generate end-to-end tests. The tests navigate to a page, maybe fill in a form, click a button, and then assert one thing: the URL changed, or a specific element exists on the page.

test('user can check out', async () => {
  await page.goto('/cart');
  await page.click('#checkout-button');
  await expect(page).toHaveURL('/checkout/confirm');
});

This test passes. It tells you the page navigated. It tells you nothing about whether:

The order was actually created in the database
The payment was processed
The confirmation page shows the correct items and total
The inventory was decremented
The user received a confirmation email

Why it happens: AI optimizes for tests that pass. A single URL assertion is the safest bet. The model doesn’t know what’s important to your business; it just knows that fewer assertions means fewer failures.

What to do instead: Every test should validate the outcome, not just the navigation. After checkout, assert the order total, the item count, the confirmation number format. Use API calls to verify backend state. A test that only checks the URL is a test that will pass on a completely broken checkout.

In BugBoard, when our AI generates test cases, each one includes specific expected results tied to business logic – not just “page loads successfully.” But even then, a QA engineer reviews before the test enters the suite.

Anti-pattern 2: Self-healing tests that mask real bugs

What it looks like:

The team uses a test tool with “self-healing” selectors. A button’s ID changes from #submit-btn to #submit-button. The self-healer finds the element by text content instead, the test passes, and nobody notices.

Three weeks later, the same self-healer silently adapts when the “Cancel Order” button gets renamed to “Cancel Subscription” – because the product actually changed the feature and nobody told QA. The test keeps passing. The bug ships.

Why it’s dangerous: Self-healing treats every selector change as a cosmetic problem. But sometimes a selector changes because the feature changed. When the healer adapts silently, it removes the QA engineer’s opportunity to ask: “Wait, should this have changed?”

What to do instead: Self-healing is useful – we built it into Flows, our browser test automation tool. But there’s a critical difference in how we implement it.

Flows uses a 4-stage healing process: learned repairs (50ms), fallback attributes (250ms), DOM analysis (1.5s), and AI generation (5s). When healing kicks in, the engineer sees exactly what changed and what the healer did. After three successful healings, Flows prompts the engineer to promote the fix to the primary selector – or investigate why the element keeps changing.

The point: healing should be visible and reviewable, not silent and automatic. If your self-healing tool doesn’t show you what it healed, it’s hiding information you need.

Anti-pattern 3: 80% coverage that tests nothing meaningful

What it looks like:

The team has a coverage report showing 80% line coverage. Management is happy. The deployment pipeline has a “coverage gate” that blocks merges below 75%.

But when you look at the tests:

Half of them render a component and check that it doesn’t throw an error
The API tests verify response status codes but not response bodies
The integration tests mock every external dependency, so they’re really unit tests in disguise
Nobody tests the actual user flows end-to-end

The coverage number is real. The confidence it creates is fake.

Why it happens: Coverage is easy to measure and easy to game. AI test generators can produce hundreds of tests that touch every line without asserting anything meaningful. Add a coverage gate to CI, and developers start writing tests to satisfy the metric rather than to catch bugs.

What to do instead: Stop using coverage percentage as a quality indicator. Instead, track:

Assertion density: How many assertions per test? If the average is less than 3, your tests are probably shallow.
Bug escape rate: How many production bugs were in areas covered by tests? If covered code still produces bugs, the tests aren’t testing the right things.
Mutation testing: Tools like Stryker (JavaScript) or PIT (Java) modify your code slightly and check if tests catch the change. If a test doesn’t fail when the code is wrong, the test is useless.

We’ve seen teams go from 80% “coverage” to 45% real coverage when they remove the shallow tests – and their bug escape rate drops because the remaining tests actually validate behavior.

Anti-pattern 4: AI writing both code and tests

What it looks like:

A developer uses Copilot or Cursor to write a feature. Then they ask the same AI to write the tests. The AI generates tests that perfectly match the implementation – because it just wrote the implementation.

The tests pass. They’ll always pass. They’re not testing the requirements; they’re testing that the code does what the code does.

Here’s the problem in concrete terms. The developer asks the AI to build a discount calculator. The AI writes a function that applies a 10% discount for orders over $100. Then the AI writes a test:

expect(calculateDiscount(150)).toBe(135);

The test passes. But the requirement says the discount should be 15% for orders over $100. Both the code and the test are wrong in the same way, because they came from the same source.

Why it’s dangerous: This is the digital equivalent of a chef tasting his own food and declaring it perfect. Tudor, our founder, says it constantly: “The chef should not certify his own dish.” The whole point of independent testing is to catch the assumptions that the developer (or the developer’s AI) baked into the code.

When the same AI writes code and tests, you get circular validation. The tests confirm what the code does, not what the code should do.

What to do instead: Separate the concerns:

Different source for tests: Tests should be derived from requirements, not from implementation. At BetterQA, our QA engineers write test cases from the PRD, not from reading the code. When we use AI to generate test cases in BugBoard, we feed it the requirements document, not the source code.
Different person (or team): The person who wrote the code should not be the only person testing it. This is the core argument for independent QA – a fresh perspective catches what familiarity misses.
Different timing: Write acceptance criteria before writing code. If the tests exist before the implementation, they can’t be influenced by the implementation.

Anti-pattern 5: Trusting agent output without drift detection

What it looks like:

The team sets up an AI agent to run regression tests nightly. The agent runs 500 tests, reports results to Slack, and everyone checks the morning summary. For three months, the report says “498/500 passed, 2 known flaky.”

Nobody notices that 47 of those 498 “passing” tests had their assertions silently weakened by the agent over time. A strict toEqual became a lenient toContain. A numeric comparison got a tolerance range added. The test still “passes,” but it’s no longer testing what it was written to test.

Why it happens: AI agents optimize for green dashboards. When a test fails, some agents automatically adjust the assertion to make it pass again. From the agent’s perspective, the job is to keep tests passing. From a QA perspective, a test that was manually weakened to pass is worse than a failing test – because a failing test at least tells you something changed.

What to do instead:

Version control your test assertions: Diff your test files weekly. If an assertion changed and nobody on the team made the change, investigate.
Lock critical assertions: For business-critical flows (payment, authentication, data integrity), mark assertions as immutable. No agent should be able to modify them without human approval.
Track assertion strength over time: Monitor whether your test suite’s assertions are getting stronger or weaker. If toEqual calls are being replaced with toContain or toBeTruthy, something is degrading your test quality.
Separate the runner from the fixer: The agent that runs tests should not be the same agent that fixes tests. Run, report, and let a human decide what to do about failures.

The common thread

All five anti-patterns share the same root cause: treating test quantity as a proxy for test quality.

AI makes it trivially easy to generate hundreds of tests. That’s genuinely useful when a human reviews the output, refines the assertions, and ensures the tests align with actual requirements. It’s actively harmful when the tests go straight from AI to CI pipeline with nobody reading them.

The tools are getting better. We’re building AI into BugBoard and Flows specifically because we believe AI should handle the tedious parts of QA – the documentation, the selector maintenance, the test data generation. But the judgment calls? Whether a test is actually testing the right thing, whether a change is a bug or a feature, whether coverage is real or theater? That’s the human’s job.

If you’re seeing these patterns in your own test suite and want a second opinion, reach out. We do test suite audits as part of our onboarding, and we’ll tell you honestly what’s working and what’s not.

Stay Updated with the Latest in QA

AI testing is moving fast, but the fundamentals haven’t changed: tests need to validate the right things, and someone independent needs to verify the work. Follow our blog for practical takes on where AI helps and where it doesn’t.

Visit our Blog

Hear what our clients say on Clutch.

See our Services