85% Test Coverage Doesn't Mean 85% Safe

Every engineering leader has said some version of this in a board meeting or an investor call. “Our test coverage is at 85%.” It lands well. It sounds responsible, mature, under control. Heads nod. Everyone moves on to the next slide.

I want to ruin that sentence for you a little, because it’s one of the most quietly dangerous numbers in software, and almost nobody pushes on it.

Here’s the follow-up question that should always come next and almost never does: coverage of what?

Because that single percentage is hiding something, and the gap between what it sounds like and what it actually measures is exactly where production incidents come from.

What the number actually means versus what you think it means

When someone says “85% coverage,” what most leaders hear is “85% of the things that could go wrong, we’ve checked.” Safe. Validated. Tested.

That is not what it means.

Code coverage measures how much of your code was executed when the tests ran. That’s it. It tells you a line of code fired during a test. It tells you absolutely nothing about whether the test checked that the line did the right thing.

This distinction sounds academic until you see how it plays out. You can write a test that calls your payment function, executes every line inside it, and never once checks whether the customer was actually charged the correct amount. That test passes. It bumps your coverage number up. And it has verified essentially nothing. The industry even has a name for this – tests that execute code without meaningful assertions, inflating the metric while providing false confidence.

So “85% coverage” can mean “we’ve thoroughly validated 85% of our system.” Or it can mean “85% of our code runs during tests that may or may not be checking anything useful.” The number looks identical in both cases. The risk is wildly different.

The 5% that sinks you

Here’s the part that should genuinely worry you.

Say you’ve got 95% coverage. Sounds fantastic. But coverage is an average across your whole codebase, and averages hide their worst members. What’s in the missing 5%?

If that 5% is logging utilities and some admin panel nobody touches – fine, who cares. But if that 5% includes your checkout flow, your login system, or your error handling, you don’t have 95% safety. You have a beautiful number sitting directly on top of the exact functions that will cost you customers and revenue when they break. A well-tested checkout at 70% coverage beats 95% coverage that skips payment validation, every single time.

This is why the headline percentage is almost useless on its own. A senior QA automation engineer working in banking and healthcare put it bluntly this year: in systems where real money moves and real patient data flows, he learned fast that 100% coverage does not equal real confidence. The systems are too complex, with too many transaction paths and external dependencies, for a coverage number to capture actual risk. Covering the whole app looks reassuring on paper and still misses the picture that matters.

Why teams chase the number anyway

If coverage is such a flawed signal, why is everyone obsessed with it?

Because it’s easy. It’s the most visible, easiest-to-track number in the entire testing process. A tool spits it out automatically, it goes green or red, you can put it on a dashboard and watch it climb. Risk – the thing you actually care about – is hard to measure and impossible to reduce to one tidy percentage. So teams optimize the thing that’s easy to measure instead of the thing that matters. Classic.

And it gets actively harmful when coverage becomes a target. The moment you tell a team “get to 90%,” you’ve changed their behavior. They stop writing tests that catch bugs and start writing tests that hit the number – superficial tests, happy-path tests, tests with no real assertions, just to color in the last few percent. You end up paying for a larger, slower, more expensive test suite that catches fewer real problems than the smaller one you had before. The metric goes up. The quality goes down. Goodhart’s law, live in your CI pipeline.

What to ask instead

So if “what’s our coverage?” is the wrong question, what are the right ones? This is where you separate teams that measure activity from teams that manage risk.

Ask what’s covered, not how much. Are the critical user journeys – the paths that touch money, data, and authentication – actually validated? I’d rather have every revenue-critical flow locked down and a coverage number in the seventies than a glamorous 90% that got there by testing the easy stuff.

Ask whether the tests assert anything. A test that runs code but checks no outcome is theater. The real question isn’t “did this code execute,” it’s “if this code did the wrong thing, would a test catch it?”

Ask about coverage by risk tier, not as a flat average. Your authentication logic handling sensitive data deserves a far higher bar than a logging helper. Uniform targets across everything waste effort on the trivial and under-protect the dangerous. Fintech and healthcare flows typically warrant 85-95%; a low-risk internal tool might be perfectly safe at 60-70%. One blanket number for the whole system is a sign nobody’s thinking about risk at all.

And ask whether anyone acts on the gaps. The dirty secret is that plenty of teams measure coverage, dutifully note the deficiencies, and then ship anyway. A report that nobody acts on is just expensive decoration.

The reframe

None of this means coverage is worthless. It’s a useful baseline – it’s genuinely good at finding dead code and pointing at obvious untested gaps. The mistake is treating it as a quality score when it’s really just an execution map.

So the next time someone on your team – or in a vendor’s pitch – proudly reports a coverage percentage, ask the question that actually matters. Coverage of what? If they can immediately tell you that every critical, revenue-bearing, data-sensitive path is validated with real assertions, you’re in good hands. If they just repeat the percentage louder, you’ve learned something important about how much that number is really worth.

The goal was never a bigger number. It was confidence that the things that matter won’t break. Those are not the same thing, and the gap between them is where your next outage is quietly waiting.

85% Test Coverage Doesn’t Mean 85% Safe

Sign Up For Our Newsletter