How we use AI to generate test cases in 30 seconds (and why a human still reviews them)

BugBoard generates test cases from requirements, screenshots, and bug history using AI. Here is how it works, what can go wrong without human review, and why we do not trust 100% pass rates.

The pitch vs. the reality

Every AI testing tool on the market right now makes the same promise: paste in your requirements, get test cases instantly. Some go further and claim those tests run themselves, pass themselves, and somehow guarantee your software works.

We’ve been building AI into our QA workflow for two years. We know what it’s good at. We also know exactly where it breaks down.

This is a walkthrough of how we actually use AI test generation at BetterQA – what the workflow looks like inside BugBoard, what the AI produces, and why we still put a human between the AI output and the test suite.

What BugBoard actually does

BugBoard is our test management platform. It has three AI generation features that our QA engineers use daily:

1. Requirements to test cases

You paste a feature description or requirement – up to 50,000 characters of spec text. The AI generates test cases with steps, expected results, preconditions, and priority levels.

Default output: 5 test cases. You can request up to 20 per generation. For a checkout flow, that means 20 test cases covering the happy path, edge cases, validation failures, and state transitions – generated in under 30 seconds.

Each test case comes with:

  • A clear title and description
  • Numbered steps with expected results
  • Priority level (critical, high, medium, low)
  • Preconditions the tester needs to set up

2. Screenshot to bug reports

Upload a screenshot of your UI (PNG, JPG, WebP, up to 10MB). The AI analyzes the image – identifies layout issues, truncated text, inconsistent styling, incorrect data display – and drafts a structured bug report with severity, steps to reproduce, and expected vs. actual behavior.

This is what QA engineers actually spend time on: documenting what they see. The AI handles the formatting so the engineer can focus on finding the next issue.

3. PRD to test plan

Paste an entire product requirements document – up to 100,000 characters. The AI breaks it into a structured test plan: scope, strategy, features to test by priority, test scenarios, acceptance criteria, and risk areas. Optional effort estimates for each section.

This used to take a senior QA engineer a full day. Now it takes two minutes to generate and an hour to review and refine.

The trust problem nobody talks about

Here’s what concerns us about the AI testing space right now.

Some tools generate tests, run them, and report results – all without a human looking at what the tests actually validate. The dashboard shows 95% pass rate. Everyone feels good. The product ships.

Then users find the bugs.

How? Because AI-generated tests have predictable failure modes:

Tests that check the wrong thing. The AI writes a test for a login form. It verifies that the page loads and the submit button exists. It doesn’t verify that entering wrong credentials shows an error message. It doesn’t check that the session token is set correctly. The test passes. The login is broken.

Tests that duplicate each other. Without context about existing test coverage, AI generates overlapping tests. You end up with 15 variations of “user can log in” and zero tests for “user session expires after 30 minutes of inactivity.”

Tests with impossible preconditions. The AI doesn’t know your staging environment. It writes a test that requires a specific user account, a specific product in the database, and a specific payment gateway configuration. Nobody can run it without modification.

Tests that pass on broken code. This is the dangerous one. If your test only asserts that an API returns a 200 status code, it will pass even when the response body is completely wrong. AI defaults to shallow assertions unless specifically prompted otherwise.

Why we keep the human in the loop

At BugBoard, the AI generates. The human validates. Always.

When our engineers use the test case generator, they review every output before it enters the test suite. They’re checking for:

  • Relevance: Does this test case actually matter for this feature?
  • Completeness: Are the steps detailed enough that another engineer could execute them?
  • Depth: Is the test checking the right things, or just surface-level behavior?
  • Duplicates: Does this overlap with existing test cases? BugBoard checks for duplicates automatically, but the engineer makes the final call.
  • Feasibility: Can this test actually be executed in our environment?

The AI is a draft generator. The engineer is the editor.

Tudor Brad, our founder, puts it this way: “AI will replace development before it replaces QA.” Development is about producing output. QA is about judgment – understanding what could go wrong, what matters to the user, what the spec doesn’t say explicitly. That’s the part AI can’t do yet.

What the workflow looks like in practice

A typical engagement with a new client goes like this:

Week 1: The client shares their PRD or feature specs. We feed them into BugBoard’s test plan generator. In 30 minutes, we have a draft test plan covering all major features, risk areas, and testing strategies. A senior QA engineer reviews it, removes irrelevant sections, adds domain-specific scenarios the AI missed, and shares it with the client for alignment.

Ongoing: As features ship, engineers paste requirements into the test case generator. They get 10-20 test cases per feature as a starting point, then add the edge cases that require product knowledge: “What happens when the user’s subscription expires mid-checkout?” “What if the API returns a 200 but with an empty array instead of a 404?”

Bug triage: When a bug comes in as a screenshot, the AI drafts the report. The engineer verifies severity, adds reproduction steps specific to the environment, and links it to the affected test cases.

The AI saves our engineers roughly 40% of the documentation time. That time goes into exploratory testing, the kind of creative, contextual work that finds the bugs AI misses.

What we’d change about the industry

The AI testing market has a credibility problem. Tools that promise “autonomous testing” and “zero-maintenance test suites” are setting expectations that the technology can’t deliver yet.

Here’s what we wish more vendors would admit:

  1. AI-generated tests need human review. Every time. No exceptions.
  2. Pass rates mean nothing without assertion quality. A test that checks status === 200 and nothing else will always pass. That doesn’t mean your software works.
  3. Coverage numbers lie. 80% code coverage with shallow assertions is worse than 40% coverage with thorough validation.
  4. Self-healing is a band-aid. When a selector changes, sometimes the right answer is to update the test. Sometimes the right answer is to file a bug because the UI changed unexpectedly. Auto-healing treats both cases the same way.
  5. The chef should not certify his own dish. When AI writes both the code and the tests, there’s no independent validation. You need a separate perspective – which is exactly what independent QA provides.

Try it yourself

BugBoard is free to start. Sign up at bugboard.co, paste a feature requirement, and see what the AI generates. Then review it critically. That’s the workflow.

If you want to see how we integrate AI test generation into a full QA engagement, check out our tools or talk to our team.


Stay Updated with the Latest in QA

AI is changing how we test software, but it hasn’t changed why we test it. Follow our blog for practical insights on integrating AI into real QA workflows – no hype, no “autonomous testing” promises.

Visit our Blog

Hear what our clients say on Clutch.

See our Services

Need help with software testing?

BetterQA provides independent QA services with 50+ engineers across manual testing, automation, security audits, and performance testing.

Share the Post: