Wedge

Why "Can't Reproduce" Is the Most Expensive Sentence in Your Backlog

May 14, 2026 · 10 min read

The sentence you've definitely typed

A user emails you on a Tuesday morning. The subject line is "doesn't work." The body is two sentences:

Hey, the export button doesn't do anything when I click it. Was working fine yesterday. Can you check?

You open staging. You sign in as your test user. You click the export button. A CSV downloads. You click it again. A CSV downloads. You try Chrome incognito. CSV. You try Firefox. CSV. You sigh, switch back to the support tab, and type something like this:

Hey! Thanks for flagging — I just tested the export on my end and it's working as expected. Could you let me know which browser you're on, and whether you see any errors in the console (Cmd+Opt+J on Mac)? Happy to dig in once I can reproduce.

You hit send. You feel mildly productive. You close the tab.

They never reply.

Three weeks later, a different user emails. Same button. Same nothing. You open staging again. The CSV downloads again. You feel the small, specific dread of an engineer who knows there's a real bug somewhere and zero leverage to find it.

This is the sentence we're going to talk about. "Can't reproduce." It feels like a polite, professional response. It is, in fact, the most expensive sentence in your backlog.

What "can't repro" actually costs you

Let's do the math out loud, because everyone feels this and almost no one tracks it.

Take an average "can't repro" loop. A bug report comes in. You spend 10 minutes trying to reproduce it. You write a clarifying reply. Two days later — if you're lucky — the user replies with one of the missing pieces. You try again. Another 10 minutes. Another reply asking for the console output. Another two days. By the time you've actually reproduced the bug, you've spent four follow-ups × 20 minutes of context-switch each = ~80 minutes per ticket.

Now stack that. A post-PMF SaaS with maybe 800 active accounts sees, conservatively, eight bugs per month that aren't immediately reproducible. That's:

~10 hours per engineer per month evaporated on round-trips
4–6 calendar days of lag between report and a fix landing
At least one bug per month that gets quietly closed because the user ghosted you

But the engineering hours are the cheap part. The real cost is on the user side. Their response time perception isn't "you replied in 4 hours" — it's "this took two weeks to fix and they kept asking me questions." Your NPS surveys catch a faint echo of it. Your retention dashboard catches the rest, except by then it shows up as "stopped logging in" with no flag, no reason code, no ticket linked. You never get to write the postmortem because there's nothing to postmortem against.

Worse: the bugs that survive this gauntlet are exactly the wrong ones to survive. Easy-to-reproduce bugs get fixed fast. Hard-to-reproduce bugs — race conditions, state-dependent failures, the ones that hurt your most engaged users — stay open longest. You are, structurally, optimizing your bug fixing for the bugs that matter least.

This isn't a process problem. It's a leverage problem. Every bug report that arrives without enough information to reproduce is a small tax on your week, and the tax compounds.

Why reproduction is hard, specifically

"Can't reproduce" isn't one problem. It's five, stacked. If you've spent any time staring at a vague support ticket, you've felt at least three of them in the same day.

Missing environment

The user is on Safari 17 on an M2 MacBook with system dark mode on and a corporate VPN routing through Frankfurt. You're on Chrome on Linux, no VPN, light mode. Half of "doesn't work" bugs are browser-specific, viewport-specific, or extension-specific. A user with the Honey extension installed once broke an entire checkout for a week. You will never guess this from the email.

Missing auth state

The user is signed in as an admin on a workspace with 47 teammates, a custom SSO setup, and a Stripe subscription that's been past-due for six days. You're signed in as `test+qa@yourdomain.com` on a fresh workspace with one project and no billing state at all. The bug lives in the seam between role, plan, and feature flag. You can't reproduce it because your test account is a different shape of user.

Missing sequence

The user did seven things in a specific order. They created a project, invited a teammate, that teammate accepted, the user changed plans, then went back to the project, then clicked export. The bug is at step seven, but it only exists because of steps one through six. You clicked export in isolation. Of course it worked.

Missing data state

The user has 12,000 rows in the table they're trying to export. You have four. The export silently times out at 8,000 rows because of a query you forgot you wrote eighteen months ago. You will never hit this with seed data.

Missing expected-vs-actual contrast

The user said "doesn't work." You don't know if they expected a download, a modal, a preview, or an email. Maybe the button works perfectly and they're looking for the wrong outcome. Maybe it half-works. Without an explicit "I expected X, I got Y," you're guessing at the shape of the bug before you even start hunting it.

Reproduction is hard because a bug report is a four-dimensional event compressed into three lines of natural language. The user shipped you the headline. You need the article, the photos, and the eyewitness account.

The three things teams currently do, and why each one fails

You already know the playbook, because you've tried all three. Let's be honest about what each one actually costs.

Ping the user for more details

This is the default move. It feels collaborative. It is, in practice, a slow leak.

The reply rate on "could you tell me more?" emails to support tickets sits somewhere around 40–60% in our experience. Half your follow-ups go into the void. The ones that do come back arrive 1–3 business days later, and they often answer the wrong question — you asked for the browser, they sent you a screenshot of the URL bar. Now you need another round.

There's also a goodwill cost no one talks about. Each follow-up email signals, subtly, that the user has to do work to get their bug fixed. The third one signals that maybe they should switch products. Your most patient users — the ones who keep replying — are also the ones most willing to wait through a churn decision.

Write a Playwright test for it

Engineering-cultured response: "We should have a regression test for this." Lovely instinct. Doesn't apply here.

To write a Playwright test that catches the bug, you have to know exactly what the bug is. You'd need the sequence, the auth state, the data shape, and the failing assertion. If you had all of that, you wouldn't be stuck on "can't reproduce" in the first place — you'd just fix it.

Tests catch regressions of known bugs. They don't find bugs you've never seen.

Close it and hope

The honest one. You mark the ticket "needs more info" and let it sit for two weeks. Slack auto-archives it. The user, who long since stopped expecting a reply, moves on. The bug stays open in the codebase, invisibly.

This works fine — for a while. Then your retention chart shows a quiet dip in week-eight engagement. Then a CSV export job fails for an account paying you $400/month. Then someone in your team Slack types "wait, didn't someone report this in March?" and you go searching through Zendesk and find seven tickets, all closed, all about the same button.

Closing tickets you can't reproduce doesn't make the bugs go away. It moves them from your sprint board onto your churn dashboard, where they're much harder to read and much more expensive to fix.

This is the bigger pattern we wrote about in QA without a QA team — the structural cost of having no one whose job is just to make support tickets into engineering-ready bugs. When that job falls on engineers, the cost is paid in hours. When no one does it at all, the cost is paid in churn.

What "verified" should mean before a bug enters your sprint

Here's the thing the three approaches above all have in common: they treat the support ticket as if it might already be a bug report. It isn't. It's a hint of a bug report.

A real bug report — the kind you'd actually want sitting in your Linear board — has five things in it. Anything less is a wish dressed up as a ticket.

1. Reproduction steps that someone actually ran. Not "the user said they did X then Y." Steps that an agent or a person executed against your staging environment, in order, and watched fail. If the steps haven't been run, they're guesses.

2. A session replay or video of the failure. You should be able to watch the bug. Not because video is fancy, but because every bug report is missing details the reporter didn't think to mention. Watching a 30-second replay tells you what the user clicked, where they paused, what they were looking at when the page didn't respond. (This is also where pure replay alone falls short — we wrote about why in session replay vs bug reproduction.)

3. Console output and network activity at the moment of failure. The 500. The CORS error. The `undefined is not a function`. The 17-second-long XHR that eventually timed out. The bug is almost always visible in the dev tools. The user almost never opens the dev tools.

4. An environment fingerprint. Browser, OS, viewport size, locale, time zone, feature flags active for that user, plan tier, role. The smallest possible set of "what's different about this user from my test account." If you don't have this, you'll spend 40 minutes proving the bug doesn't exist in your environment.

5. An honest expected-vs-actual line. "User clicked Export. Expected: CSV download. Actual: button shows loading spinner for 12 seconds, then nothing. No console error. Network tab shows `/api/export` returning 200 with empty body." That sentence is the difference between a bug an engineer can start on in 30 seconds and a bug an engineer puts off for a week.

If a ticket has all five, it's verified. It belongs in your sprint. If it has fewer than three, it's a wish — and putting wishes into your sprint is how you end up with a backlog full of "P2 — investigate" cards that no one ever picks up.

The manual version of this checklist is doable. We wrote up how to reproduce a bug from a support ticket — the literal playbook, step by step. It works. It also takes 30–45 minutes per ticket when you do it right, which is exactly why no one does it right.

How we built an agent to do this part

This is the part of the post where we tell you what we made. We'll keep it short.

FixFirstly is an AI QA agent for SaaS teams that don't have a dedicated QA hire — typically 1 to 15 engineers, post-PMF, with support coming in faster than anyone can triage it. The pitch is one sentence: inbox in, verified GitHub issue out.

Here's the flow. You connect your support channels — Gmail, Slack, Zendesk, Intercom, Linear, a CSV upload, or our API. Bug reports get pulled in automatically. We cluster duplicates using semantic embeddings, so seven tickets about "export button doesn't work" become one issue with seven attached reports instead of seven separate tabs in your Linear board. Then, for each clustered issue, we dispatch an AI agent to a scoped test account on your staging environment. The agent signs in, attempts the repro, and produces:

Steps it actually ran (not steps it guessed)
A session replay video of the attempt
Console output and network activity
The environment fingerprint
An expected-vs-actual summary

If it reproduces the bug, it files a GitHub issue with all of that attached. If it can't, it tells you that too — honestly, with the audit trail of what it tried — so you can decide whether to ping the user or close it. No silent failures. No mystery "the agent said it couldn't repro."

A few specifics worth naming, because they're the questions everyone asks:

Scoped test account. You give the agent its own credentials. It can't touch real user data. If you want, you can isolate it to a single test workspace.
Staging by default. The agent runs against staging, not production. You can point it at production if you really want to, but the default keeps your real users away from a curious agent.
Full audit trail. Every agent run is logged. You can replay exactly what it clicked, what it saw, what it decided. No black box.

Free tier is $0/month with three investigations — enough to feel whether this works on your real tickets. Starter is $49/month for 25 investigations. Pro is $129/month for 75, and Scale is $349/month for 250.

If you want the deeper version of how the agent actually drives the browser and decides when it's reproduced something, here's how it works.

The shorter version

You're not bad at reproducing bugs. You're being asked to do an impossible thing — translate a two-sentence email into a four-dimensional event — eight times a month, on top of your actual job. "Can't reproduce" isn't a failure of effort. It's a failure of leverage.

The fix isn't more discipline. It's not a better template for support replies. It's not another Playwright test. The fix is moving the reproduction step out of your engineers' calendars and into something that can do it the same way, every time, on every ticket.

Once "verified" means something specific — repro steps, replay, console, environment, expected-vs-actual — your sprint board stops being a wishlist and starts being a queue of real work. Your support response times collapse. Your churn dashboard gets quieter. And you stop typing the sentence.

Stop closing tickets you can't reproduce. FixFirstly's free tier gives you three investigations a month — enough to verify your next handful of bug reports without changing how your team works. Join the waitlist.

Verify your next bug in 24 seconds, not 4 days

FixFirstly reads bug reports from your inbox, reproduces them on staging, and files verified GitHub issues. Free during early access.

Join the waitlist