I saw a post about a year ago talking about the “five whys” technique of
trying to figure out what caused something to fail. It was using a car
scenario for an example, and it went something like this:

The car didn’t start… because the battery is dead… because the
alternator wasn’t charging it… because the alternator belt broke…
because the belt was beyond its useful life but wasn’t replaced…
because it wasn’t maintained according to recommended schedule.

That’s about five levels, and it pretty much stopped there. I figure,
well, you can go beyond that, and in the case of the infra stuff at
a big enough company, you probably need to if you intend to actually try
to fix something.

So, that’s been my life: trying to roll back through the series of
actions (or lack of actions) to see how things happened, and then trying
to do something about it. The problem is that if you do this long
enough, eventually the problems start leaving the tech realm and enter
the squishy human realm.

Perhaps you’ve heard of the OSI model of networking, where you have
seven layers as a way to talk about what’s going on in the “stack”.
I’ve seen some brilliantly snarky T-shirts that talk about “layer eight”
and sometimes beyond as things like “corporate politics” and
“management” and all of that good stuff.

It turns out that when you start doing this root-cause analysis and
really keep after it, the “squishy human realm” is actually the
no-longer-hypothetical “layer eight” from those T-shirts.

In our “car” example, you might discover that management is forcing
people to ignore the maintenance schedule while saying things like
“it’ll work, trust me”. Or, they’re doing even worse things, like
ignoring safety codes that have been written in blood.

For those of us in tech, we tend to get off much more lightly than
people who do Actual Stuff in the Real World (like cars). Chasing down
our problems means you start getting into things like “empire-building
manager is hiring anyone with a pulse in order to look more important
by having more direct reports”. Maybe you chase that one down and you
get to “manager of manager is also into this whole thing, and benefits
from the equation”.

That might lead into “the entire company is obsessed with hiring even
though the tech equivalent of the
Drake equation
says there is no way they can find anywhere near that many qualified
people in the entire world.

What that does that look like? Well, some people have no business
working on certain kinds of systems, whether as a transient situation,
or a permanent one. Transient situations are a lack of training.
Permanent ones might come from attitudes or a genuine lack of ability
for whatever reason. Having the wrong person on the job is supposed to
be noticed and handled by the manager. If they don’t, that’s a failure.

Now, the team’s manager (M1) also has a manager (M2) of some kind, and
M2 is supposed to be making sure M1 can actually, well, manage! If they
can’t tell if that’s happening or not, that too is a failure.

In some situations, you come to realize that a whole bunch of bad things
happen due to non-technical causes, and they are some of the hardest
things that you might ever need to remove from an organization. Unlike
the line workers, management is in a whole different world in which the
“reality distortion field” matters most. You either generate a big
enough one yourself, or you slot into someone else’s. If you are
opposed to it, you are rejected.

I guess this is my way of warning anyone who fancies themselves a
troubleshooter and who really, truly, wants to get to the bottom of
things. If you do this long enough, expect to start discovering truly
unsatisfying situations that cannot be resolved.

Also, I will remind anyone who wants to try to tilt at such a windmill
that if you are given responsibility without the power to make any
changes, then you have just become the scapegoat. I said this in a
post
way back in February 2013, and I *still* fell into that damn trap in
2017 within a particularly broken organization.

Finally, in this same vein, I wanted to share something that a reader
sent to me a while back, and that I found to be brilliant and amazing
(I still do, but I did then, too):
People can read their manager’s mind.

In particular, pay attention to where it says corollary 1 and starts
talking about the “insane employee”. The whole “personal offense”
thing? Yeah, if you have the ability to not become that person, try to
avoid it. Alternatively, if you’re cursed with the tendency to fall
into those things, try not to give yourself a hard time when someone
terrible takes advantage of you for the nth time.

Hang in there.

Read More