How blame stops your debugging
I want to start by stepping away from incidents for a second.
Let’s say you're in the kitchen with a friend. You’re cooking together, chatting, chopping, stirring. At some point, your friend cuts their finger while chopping onions. The first thing you do is obvious: mitigate the impact. You grab a paper towel, help them rinse the cut, and find a band-aid.
Now imagine that right after you put on the band-aid, you say a version of: “Well, be a little more careful with that knife.”
In that very moment, you close an important connection as you’ve decided the “cause” is that they weren’t careful enough. As a result, you may never find out that the knife has a small notch in the blade that makes it slip sometimes, or that your friend’s attention was low because of a hard conversation earlier, or that you were both rushing because the pan was already hot and you wanted to get the onions in before they burned.
You might only see some of that later, when it happens to you.
“Just be more careful” feels like an answer, but it trades a simple story for a chance to understand what’s really going on.
What a simple and illustrative example.
SRE culture has taught me some wonderful things. However, a good post-mortem is certainly one that sticks with me and has served me well across various professional scenarios.
This only works in a trusting culture. This only works in a competent culture. This only works when you want to build a competent, trusting culture.
What's more worthy of this than your own family? I can't believe I didn't realize this until today.