← All essays

The canary is still singing

The old metrics still sing in tune. The danger they were guarding against has changed, and they're not the canary for the new one.

The application compiles and all tests pass. I scroll back through the diff and realize that right now I couldn't explain to a colleague what that second loop is for.

It's not a large function. Forty lines, maybe. Claude wrote it after I described what I wanted. The change works, and the test I added (well, Claude added) hits the obvious paths. By every check I know of, this would be a clean PR.

I'm not opening it yet.

Software's own canaries

Miners used to bring canaries down into the mine because the birds breathed faster and reacted to carbon monoxide sooner than humans did. As long as the canary kept singing, the air was safe. When the singing stopped, you climbed as fast as you could.

Software has always had its own canaries. Things you watched as a developer. The number of stories with bugs per sprint. Team velocity. Test coverage. They weren't perfect, but they were the canaries we used to check whether things were still on track. When one of them went quiet, something was wrong, and you could usually find out what.

The question is whether those canaries still work.

Still singing in tune

Take velocity. For most of my career, velocity was a rough measure of speed: how quickly a team could ship new functionality. Now Claude produces a thousand lines of working code in an hour. Velocity has come loose from understanding. A sprint that closes twice as many tickets doesn't mean the team thought twice as hard. It might mean the team thought half as much.

Test coverage is another metric that no longer says much. Coverage was always a proxy: tests on the edges meant someone had thought about those edges. Now the same model that writes the code also writes the tests. A high coverage number tells you that Claude was asked to write tests. It doesn't tell you that anyone considered what shouldn't work, or which scenarios were even thinkable. Just like before, the author of the code and the author of the tests is the same author. It used to be the developer. Now it's Claude. And Claude doesn't test the same edge cases.

Or look at bug counts. The first impression is that AI brings them down: known patterns get fixed faster. The real picture is different. Codebases grow faster now, and bugs lag the writing by months, or only show up after a certain critical mass. The bug count today reflects the code from before the acceleration. The line on the chart can look healthy right up to the moment it doesn't.

These were never perfect canaries. But the song they sang used to mean something. Now they sing in tune while we no longer know the mine itself.

What's left

When I sat with those forty lines and couldn't explain the second loop, what warned me wasn't a hard, measurable metric. It was the distance between the code I had written (well, Claude had) and the code I understood. My understanding was narrow, and the gap between the code and that understanding was wide.

That distance is the canary.

If I can explain a change to another developer, I still know the mine. If I can't, I'm walking by feel. If I catch myself thinking it works because Claude said so, the bird has stopped singing.

The indicator has moved from hard numbers on a dashboard to the developer. That's uncomfortable. Dashboards are easy to share and easy to audit in a review meeting. A claim about your own understanding of the code is harder to verify. But it's also harder to fake. You can manipulate a coverage number. It's much harder to bluff your way through five minutes of explaining code you never actually understood.

Before the merge

Before I open a pull request, I look at the diff and work out why each change is what it is. If I can't, I don't open the PR. I go back and read. Sometimes I rewrite. Sometimes I find that what Claude proposed and what I needed had quietly drifted apart while everything kept compiling.

It's the only way I have right now to catch what tests and hard metrics can't: code that works without anyone understanding what it does.

The canary is still there. It just sings inside your head now. The trick is to hear it before you open the PR.

Marco Geuze

Marco Geuze

Building software for 30+ years. Founder of GDK Software.

More about me →