There’s a moment that happens about a year after a team commits to test automation, and almost nobody warns you about it.
You did everything right. You hired the automation engineers. You built the suite. Four hundred end-to-end tests, running in the pipeline, green checkmarks on every merge. Automation was supposed to free your QA people from the grind so they could focus on the hard, interesting problems. That was the whole promise.
So why are your two senior QA engineers more swamped than they were when everything was manual? Why does every standup include someone saying “the suite’s flaky again, I’m looking into it”? Why does it feel like you bought a robot to do the dishes and now you spend all your time fixing the robot?
This is the part of automation nobody puts in the sales deck. And once you understand it, a lot of confusing things about your QA team’s workload suddenly make sense.
The number that should stop you cold
Let me give you the figure first, because it’s genuinely startling.
The World Quality Report has found that maintenance can eat around half of a test automation budget. Other 2026 breakdowns put ongoing maintenance at 40 to 70% of total automation cost – consistently outweighing the initial build. There’s a widely-cited Medium piece from this year describing a Series B startup with 400 automated tests where two senior engineers were spending 60% of their time just keeping the suite alive.
Sit with that. More than half the money and time you put into automation doesn’t go toward catching new bugs or expanding coverage. It goes toward keeping the tests you already wrote from falling over.
And here’s the cruel part: it scales the wrong way. Double your test coverage, you roughly double your maintenance burden. The economics don’t improve as you grow – they get heavier. Every test you add is another test that can break for reasons that have nothing to do with an actual bug.
Why “passing tests” quietly became a full-time job
To understand why this happens, you have to understand what most automated tests are actually checking.
A traditional UI test doesn’t really test what the user experiences. It tests implementation details – this specific button has this specific CSS class, this element sits at this spot in the page structure. So a developer refactors a component, renames a class during perfectly normal cleanup, and twelve tests go red. Nothing is broken. The app works fine. But your suite is screaming, and now an engineer has to stop, investigate each failure, confirm it’s not a real bug, and update the test.
That’s a flaky test. And flakiness is the silent killer of automation ROI, for two reasons.
The obvious one is wasted time. Surveys this year put QA engineers at spending 20 to 30% of their work week just triaging failures – sorting the real bugs from the noise. That’s a day or more, every week, per engineer, spent being a janitor for the test suite instead of a quality engineer.
The less obvious one is worse: flaky tests destroy trust. Once a suite cries wolf enough times, people stop believing it. They start re-running failed tests hoping they’ll pass the second time. They start ignoring red builds. And the day a flaky-looking failure is actually a real, shipping-to-production bug – they wave it through. At that point your expensive automation isn’t just wasting time, it’s actively giving you false confidence. That’s worse than having no tests at all, because at least without tests you know you’re flying blind.
The trap teams fall into next
When a team hits this wall, the instinct is to throw bodies at it. Hire another automation engineer to handle the maintenance load. And it works for about a quarter – until the suite grows again and you’re right back where you started, just with a bigger payroll.
This is the structural ceiling of implementation-based testing, and you cannot hire your way out of it. The problem isn’t that your engineers are slow or your process is sloppy. The problem is that the tests were built to check how the app is built rather than what it’s supposed to do, and that design choice generates maintenance work forever.
I want to be fair here, because there’s a version of this that turns into a tool pitch and I don’t want to insult you with one. No tool makes maintenance zero. Anyone promising that is selling. But the design of your automation absolutely determines whether maintenance is a manageable 15-20% or a crushing 50-70%, and most teams are unknowingly sitting at the wrong end of that range.
What actually moves the needle
A few things genuinely change the math, and they’re worth knowing whether you fix this in-house or bring in help.
First, the framework choice is not cosmetic. Playwright’s auto-waiting alone eliminates a large chunk of the timing-related flakiness that plagues older Selenium suites – roughly 40% of flakiness issues by some measures. If your suite is drowning in “element not found” failures that vanish on a re-run, a meaningful share of that is fixable at the framework level.
Second, test behavior, not implementation. Tests written around what the user is trying to accomplish – intent – survive a CSS refactor that would shatter a suite built on brittle selectors. This is the entire idea behind self-healing automation: locators that adapt when the page shifts, instead of breaking the moment a class name changes.
Third, and most important, stop over-building at the top. A huge amount of maintenance pain comes from teams stacking too many slow, fragile end-to-end UI tests when a faster, sturdier API or unit test would have caught the same issue. The teams getting crushed by maintenance are almost always the ones running an upside-down test pyramid – heavy on exactly the tests that break most.
The honest takeaway
Automation was never the finish line. It was a trade – you swapped the cost of running tests by hand for the cost of keeping automated tests alive. For a lot of teams that trade quietly went bad, and the tell is simple: if your senior QA people spend more time fixing tests than expanding coverage, your automation has stopped being an asset and started being a liability you’re paying to maintain.
The fix isn’t more discipline or more hires. It’s building the suite so it doesn’t generate this work in the first place – the right framework, behavior over implementation, and the discipline to not automate everything at the most expensive layer. Get that right and automation does what it promised. Get it wrong and you’ve just bought yourself a very expensive robot to babysit.
Worth auditing one number this week: of the hours your QA team spends on the test suite, how many go to catching new problems versus fixing old tests? If that ratio is anywhere near 50/50, you already have your answer.