All articles

AI Test Case Generation in Jira: A Practical Guide

AITest GenerationAnthropicOpenAIDeepSeekGemini

Half the sprint was just writing tests

This was the thing that bugged us the most before we built TrueStory. Our QA engineer would spend Monday and Tuesday just writing test cases for the sprint's tickets. Not running them. Writing them. Figuring out the happy paths, the edge cases, the "what if the user does something dumb" scenarios.

By Wednesday she'd start actual testing. By Friday we'd realize some tickets didn't get tested at all because there wasn't enough time. It felt like a factory where the workers spend half their shift sharpening tools and half actually building.

We thought: what if AI could write the first draft?

What actually happens when you hit "Generate"

I want to explain this properly because there's a lot of hand-wavy "AI-powered" marketing out there and I want you to know exactly what's going on.

When you click "Generate with AI" on a Jira ticket, TrueStory grabs everything it can about that ticket: the summary, description, acceptance criteria, any existing test cases. Then it takes whatever instruction template you've selected (more on those in a second) and stuffs everything into a single prompt.

That prompt goes to the AI provider you configured. Could be Claude, GPT-4, DeepSeek, Gemini, whatever you picked. The model comes back with a list of suggested test cases, each with a title, severity level, and test steps.

You see them in a list. Checkboxes next to each one. You tick the ones you like, maybe edit a title, uncheck the one that's weird, and hit Save. The whole thing takes maybe 90 seconds.

It's not magic. It's a shortcut. The test cases still need a human to look at them and decide if they make sense. But going from "blank page" to "here's 6 reasonable test cases, pick the ones you want" saves a ton of time.

Why we're stubborn about bring-your-own-key

A lot of tools in this space route your data through their own servers. Your Jira ticket descriptions, your acceptance criteria, your internal project details, all passing through some third party's infrastructure before it reaches the AI.

We didn't want to do that. So TrueStory uses a BYOK (bring your own key) model. You enter your own API key for Anthropic, OpenAI, DeepSeek, or Gemini, and the calls go directly from Atlassian's Forge infrastructure to the AI provider. Your data never hits our servers.

Your key is stored in Forge's encrypted secret storage. We literally cannot read it. If you work at a company where the security team has opinions about where data goes (and honestly, they should), this is the approach that lets you use AI without a 6-month procurement review.

The trick is in the templates

Here's something we learned early: the default prompt produces decent test cases, but specific instructions produce great ones.

That's why we added instruction templates. You can save up to 5 per project, each for a different purpose. Our team uses three:

One for functional testing, the bread and butter. "Cover happy paths and error cases, include expected results for each step." This is the default for most tickets.

One for security-focused testing. We pull this out for anything involving authentication, payments, or user data. "Check for auth bypasses, input validation, XSS vectors." The AI produces very different test cases with this template.

And one for regression testing. Broader, shallower. "Focus on existing functionality that might break. Think about side effects." We use this when a ticket touches shared code.

When you click generate, you pick which template to use. Same ticket, different template, completely different test cases.

A real example from last week

We had a ticket: "Add ability to export session results as CSV." Pretty straightforward.

With our functional template, the AI came back with:

  1. [HIGH] Export completed session: CSV contains all test results with correct statuses
  2. [HIGH] Export session with mixed results (pass/fail/warning), all statuses represented
  3. [MEDIUM] Export session with no results yet, empty or meaningful placeholder
  4. [MEDIUM] CSV includes ticket keys and test case titles
  5. [LOW] Filename includes session ID and date
  6. [LOW] Large session export (50+ test cases), file is complete, no truncation

Our QA engineer kept 1, 2, 3, and 4. Dropped 5 because we hadn't spec'd the filename. Tweaked 6 to say "30+ test cases" because our largest session so far was 34.

Took her about a minute. Writing those from scratch would have been 10-15 minutes, plus the time spent staring at the ticket trying to think of edge cases.

Honest caveats

The output quality depends heavily on the ticket quality. If your ticket says "fix the bug" with no description, the AI is going to produce garbage. Detailed tickets with acceptance criteria produce useful tests. Vague tickets produce vague tests. Garbage in, garbage out. Same as it ever was.

Also, AI doesn't know your product. It doesn't know that your checkout flow has a quirky redirect on mobile, or that the CSV export needs to handle special characters because your client is in Japan. You still need a human who knows the product to review and adjust.

We treat AI-generated tests as a first draft, never a final product. The value is in skipping the blank-page problem, not in removing human judgment.

Try it yourself

Install TrueStory, add your API key in Settings, and try generating tests on a well-described ticket. You'll see what I mean.

Detailed setup is in the docs.