Pillar

The 5-Engineer QA Playbook: Shipping Weekly With No QA Hire

May 14, 2026 · 11 min read

You ship on Tuesday. A user emails Wednesday. The screenshot is blurry, the steps are vague, the browser is "Chrome I think." You guess. You spend forty minutes clicking around staging. Nothing breaks. You close the ticket "can't repro." Two weeks later three more users hit the same thing and one of them churns.

This is the QA wall. Every post-PMF team hits it. Most teams hit it without noticing — they just feel slower, grumpier, and a little embarrassed by their changelog.

Here is what to do about it when you don't have a QA hire, can't justify one yet, and refuse to spend half your engineering payroll on a problem that started as "the occasional weird bug."

The QA wall every post-PMF team hits

At five engineers, QA lives in your head. You wrote half the code. You remember the edge case in billing because you fought it in February. When a bug report comes in, you have a hypothesis before you finish reading. You repro it in ten minutes. You fix it before lunch.

At twelve engineers, this stops working. You didn't write the code anymore. The engineer who did is on a different feature, in a different timezone, or quit. The bug arrives. Nobody has the hypothesis. Someone draws the short straw and spends an afternoon on it.

The signal isn't ticket volume. It's the close-reason distribution. Pull your last sixty closed bugs. Count how many ended in "can't reproduce," "need more info," or "closing — no response from reporter." If that number is under 15%, you're fine. If it's 30%+, you've hit the wall. If it's over 50%, you crashed through it months ago and the bugs you think you fixed are still in production.

The wall has a second symptom: triage stops being a fifteen-minute morning ritual and becomes a meeting. The meeting has a slide. The slide has the phrase "we need a better process." You don't need a better process. You need the loop to run without you.

The four real options when bugs outrun your sprint

There are four. Three of them are bad in specific, knowable ways. Be honest with yourself about which one you're choosing.

Hire a QA engineer

A mid-level QA engineer in the US runs $90,000–$140,000 all-in once you count benefits, equipment, and the slice of your HR/finance time they consume. In Europe it's lower, but the ramp is the same: three months before they're productive, six before they're opinionated enough to tell you which bugs to ignore.

Then there's the bus factor. One QA engineer is one QA engineer. They take vacation. They get sick. They eventually leave. The institutional knowledge they built — which flaky test to ignore, which staging account has the weird billing state, which integration silently fails on Safari — leaves with them. You hire the second one to fix that, and now you're spending $200k+ on QA at a fifteen-engineer company.

This isn't a bad option. It's the right option at thirty engineers. At eight, it's a luxury purchase you'll justify with phrases like "investing in quality."

Outsource to a QA agency

Agencies sell you per-test pricing, which sounds rational until you realize "a test" is a unit they define. Onboarding takes two to four weeks because they need to understand your product before they can break it. Once they're in, you get reports. The reports are thorough. They are also opaque — you don't see the agent doing the work, you see the writeup, and the writeup is in someone else's voice.

Worst part: agencies optimize for finding bugs, because finding bugs is what they bill for. They will find bugs. You will not be happier. Your engineers will spend Mondays arguing about which agency-reported "issues" are real.

Agencies make sense for a pre-launch hardening sprint or a compliance audit. They do not make sense as ongoing QA for a weekly-shipping product.

Suffer

This is the option most teams pick by default, because it requires no decision. You eat the bugs. You triage when you can. You close "can't repro" tickets. You tell yourself the churn is competitor pressure.

The cost is invisible until it's catastrophic. Your churn dashboard finds out two quarters later, when an annual cohort analysis shows the leak. By then the user who emailed you in March is gone, and so are the three friends they would have referred.

There's a quieter cost too. Senior engineers spend 30–50% of their week on bug triage at the wall — reading reports, asking for clarification, hunting repros that never materialize. That's the most expensive labor in your company being spent on the lowest-leverage work in your company. It's not a process problem. It's a routing problem.

Automate the loop

This is the fourth option. It's the one this post is about. It only works if you understand what "the loop" actually is, which is the next section. Skip ahead if you already do.

What "automating the loop" actually means

When people say "automate QA," they usually mean one of two things. Both are wrong for the problem at hand.

The first wrong thing is test authoring — Playwright suites, Cypress flows, the whole regression rig. These are good. You should have them. They catch regressions on code you wrote on purpose. They do not catch the bug a user just emailed you, because nobody wrote a test for the thing that broke. If they had, it wouldn't have broken.

The second wrong thing is generic AI — drop ChatGPT on your bug reports, get a summary. Summaries don't fix anything. The user already wrote a summary; that's the bug report. What you need is the next step, which is the thing humans do that AI usually doesn't: open the app, click the buttons, watch it break, record what happened.

The loop is four steps. Each one should be runnable without a human in the middle.

Step one: ingest

Bugs arrive in five places: Gmail, Slack, your support inbox (Zendesk, Intercom, whatever), Linear tickets users filed themselves, and the occasional CSV someone exports from a tool you don't have an integration for. Automation step one is pulling all of these into a single stream. Not normalizing them into perfect tickets — just getting them in one place, with their original text intact, so the next step has something to chew on.

Step two: cluster

This is the step most teams skip and it's the one that matters most. Three users describe "the export button doesn't work" three different ways. One says "CSV export hangs." One says "I clicked download and nothing happened." One pastes a stack trace. These are the same bug. Without clustering, you triage them as three separate tickets, repro three times, and probably close at least one as "can't repro" because that user happened to be on Firefox.

Good clustering uses semantic embeddings — not keyword match. "Hangs" and "nothing happened" share zero keywords and the same meaning. We wrote about how embeddings collapse duplicate bug reports into a single signal if you want the deeper version. The short version: a similarity threshold around 0.55 groups the obvious dupes without merging unrelated reports, and you stop reading the same bug eight times.

Step three: reproduce

This is the hard one. Clustering is a math problem. Reproduction is an agent problem. Something has to read the clustered report, open your staging app, sign in as a scoped test user, follow the steps (or guess at them when they're vague, which is most of the time), and either reproduce the bug or report that it couldn't.

When it works, you get a verified repro: video of the session, console logs, network requests, the exact click path. When it doesn't work, you get a "couldn't reproduce — here's what I tried" report, which is dramatically more useful than the user's original email because at least you know what's been ruled out. We've written more on what it takes to turn a vague support ticket into a verified repro.

A good repro looks like this: cluster of 4 reports, agent signed in 2 minutes ago, reproduced on attempt 1, video attached, console error on line 247 of checkout.tsx, GitHub issue filed, assignee suggested.

If you can get to that artifact without a human touching it, you have automated the loop.

Step four: file

The output of the loop is a GitHub issue (or Linear, or whatever you use) with the repro attached, severity scored, and ideally routed to the right engineer based on the file paths in the stack trace. Not an email. Not a Slack ping. An issue, in the system your engineers already work from, with everything they need to start fixing.

This is also where the triage layer pays off — by the time the issue lands, severity is already scored, duplicates are already merged, and the engineer who opens it sees one ticket representing twelve users, not twelve tickets representing the same bug.

When you should still hire QA (the honest answer)

This is the part where most posts pretend their product replaces everything. It doesn't. Hire a QA engineer if any of these apply:

Regulated industries. Healthcare, finance, anything with auditors. You need humans with signed names attached to test runs. An agent's audit trail is real, but it isn't a person, and the regulator wants a person.

Mobile-heavy products. Native iOS and Android QA is its own discipline. Device farms, OS version matrices, App Store review cycles. Agents that drive web staging environments don't help you here. You need someone who owns the device shelf.

Compliance-driven release cycles. If your release process requires a sign-off ceremony — SOC 2 evidence, change-control board, formal UAT — you need a human to own that. Automation can feed the ceremony, but it can't be the signatory.

More than 25 engineers. At that size you have specialization pressure. QA engineers pay for themselves not by finding bugs but by owning the test infrastructure that lets the rest of the team move faster. The math flips.

For everyone else — small post-PMF teams shipping weekly to a web product — the loop is enough, and an FTE is overkill until it isn't.

A 2-week plan to set up no-QA QA

Concrete days. Skip whatever you've already done.

Week 1: ingest and stage

Monday. Pick your dominant source. For most teams it's Gmail (support@) or Slack (a #bugs channel). Don't try to wire everything at once. Pick the one place where 70% of your bug signal arrives. Connect that first. You can layer Zendesk or Intercom in week three.

Tuesday. Create a scoped test account in your staging environment. Not a copy of your production admin user. A clean account with realistic data — a few orgs, a few projects, some seeded transactions if you have billing. This account is what the agent will use. Treat it like a service account: rotate credentials, log everything, don't give it production access. Ever.

Wednesday. Verify the test account can actually do the things users complain about. If 40% of your bugs are about checkout, the test account needs a payment method, a cart, a shipping address. If you can't manually walk the test account through the bug, the agent can't either.

Thursday. Connect the source to your loop. Watch the first batch of messages flow in. Look at clustering output. Sanity-check it — if two clearly different bugs got merged, your threshold is too loose. If duplicates aren't merging, it's too tight.

Friday. Read the clustered output. Don't run agents yet. Just confirm the signal is real. You should see fewer items than raw messages, and the items should make sense.

Week 2: reproduce and ship

Monday. Define severity thresholds. What counts as a P0? What gets auto-filed vs. flagged for human review? Most teams start conservative: only auto-file P1+ with successful repros, send everything else to a review queue. You can loosen this once you trust the agent.

Tuesday. Wire GitHub (or Linear). Test the issue template. Make sure the repro video, console logs, and cluster summary all land in the issue body in a format your engineers will actually read.

Wednesday. Run the first real agent batch. Pick the top-priority cluster from yesterday's queue. Watch the agent attempt the repro. Read the output. Note what it got right and what it missed.

Thursday and Friday. Iterate. The first week of agent runs is calibration. Some bugs the agent will nail on attempt one. Some it will give up on, and the report will tell you why ("couldn't find the export button — has the UI changed since the report?"). Use the failures to tune your test account or your staging environment, not to abandon the loop.

By the end of week two you should have a working pipeline: bug arrives, gets clustered, gets a repro attempt, gets filed with evidence. The full mechanic is laid out on the how-it-works page if you want the diagram.

What changes when this works

The visible change is your triage queue. It shrinks. Senior engineers stop spending mornings reading vague Gmail threads. Triage time, in our rough measurement across teams running the loop, drops 60–80%. Not because bugs go away — they don't — but because by the time a bug reaches a human, it's already a clustered, reproduced, prioritized issue in GitHub.

The invisible change is the close-reason distribution we started with. "Can't reproduce" stops dominating. It still appears — some bugs genuinely don't reproduce, and that's information too — but it stops being the default exit. Your "fixed" count goes up because your "fixed" count was always inflated by closed-can't-repro tickets that came back as churn.

There's a third change, harder to attribute. Your shipping pace stops stuttering. The Friday afternoon where you planned to merge three PRs and instead spent four hours on a support escalation — that afternoon stops happening as often. Not never. Less often. We've written more on why "can't reproduce" is the real wedge and what it costs when you don't fix it.

Your dashboard graphs improve weeks before you notice. Churn moves first. NPS moves second. Eventually someone in a board meeting asks what changed and you struggle to point at one thing, because the thing that changed isn't a feature — it's that the bugs you used to lose started getting fixed.

Run it on your own bugs

The free tier is three investigations per month, no card. That's enough to test the loop against your real bug stream and decide if it earns a line on your monthly invoice. If it does, pricing is on the site — Starter is $49/mo for 25 investigations, Pro is $129 for 75, and Scale is $349 for 250. If it doesn't, you keep the three free investigations and the cluster view, and you've lost nothing.

Join the waitlist. Bring the bugs you've been avoiding.

Verify your next bug in 24 seconds, not 4 days

FixFirstly reads bug reports from your inbox, reproduces them on staging, and files verified GitHub issues. Free during early access.

Join the waitlist