As a CIO a lot of what you do is to design stuff, and that’s when you aren’t overseeing other people who design stuff. Or when you aren’t making sure the stuff everyone’s designing fits together the way it should.
There are some universal rules that govern good design no matter what’s being designed. The most famous is probably the great architect Louis Sullivan’s dictum that form follows function. Less well-known, but just as important (at least for our context) is one introduced by W. Edwards Deming: To optimize the whole we must suboptimize the parts.
This matters no matter what’s being designed, whether it’s a gadget, software, an organization, or a process. And it’s the key to understanding why so many CIOs get optimization wrong.
From queue to queue: The hidden process bottleneck
If CIOs could make a living on a single trick, process optimization would likely be it. It’s vital to IT performing its own role well, and a lot of what IT does for a living is to help business managers optimize their processes, too.
Process optimizers inside and outside IT have a wealth of frameworks and methodologies at their disposal. Lean is among the most popular, so let’s use that to illustrate the point.
Perhaps the most important but least recognized contribution Lean thinking has made to the world of process optimization is that processes aren’t collections of tasks that flow from one box to the next box to the next.
Instead they’re tasks that flow from queue to queue to queue. The difference may seem subtle, but it’s one reason optimizing a whole delivers different results from optimizing the parts of a whole. This may sound like academic hoo-ha, or IT koan, but understanding this difference is key to mastering process optimization.
Hear me out.
Imagine you’re managing a project that needs a new server to proceed, assuming for the moment IT hasn’t gone full cloud and still owns servers and a data center. You follow procedure and submit a request to the IT request queue.
Oversimplifying a bit, the box-to-box view of what follows would look something like the figure below:
It’s a straightforward flow. The teams responsible for each step long-ago optimized the procedures for addressing their responsibilities. The total effort and process cycle time are the same — for this hypothetical example, figure about eight hours, or one day on the project schedule.
But the box-to-box view of the process is wrong. The actual process looks more like the following figure:
Each step in the process is managed as a first in, first out (FIFO) queue. Teams work on requests only when the request has flowed through the queue and popped out for processing. The total effort is the same as estimated in the box-to-box view. But the cycle time includes both work time and time in queue — for this modeled process, five days more or less.
The actual analysis is more complicated than this. Usually, one step ends up being a bottleneck; work stacks up in its queue while other queues run dry, counterbalanced by all queues receiving requests from more than one source. But that doesn’t change the principle, only the complexity of the simulation.
This is real, not just theory. Not that many years ago a client, whose queue sizes were quite a bit longer than what’s depicted above, experienced multi-month project delays as their teams waited for the installation of approved servers they were depending on, even though a typical server required no more effort to acquire, configure, and install than what’s depicted above.
The root cause? The managers responsible for procurement, network administration, software installation, quality assurance, and deployment had all organized their departments’ work to maximize staff utilization and throughput.
They — the parts — had optimized themselves at the expense of each project’s whole.
The solution, which DevOps devotees will immediately recognize and embrace, was to include IT infrastructure analysts on the core project team, and, even more important, to include infrastructure tasks such as setting up servers in each project’s work plan, assigning start dates and due dates based on when their work products would be needed.
With this change, server builds became part of the project schedule instead of being externalities over which the project manager had no control.
In exchange, the CIO had to accept that if projects were to deliver their results on time and within their budgets, the rest of the IT organization would have to allow some slack in their work management. Staff utilization targets wouldn’t and shouldn’t even approach 100%. (Pro tip: Invest some time researching Eliyahu Goldratt’s Critical Chain project management methodology for a more in-depth understanding of this point.)
The MBO meltdown
The optimization / suboptimization issue applies to much more than process design. Take, for example, management compensation.
Back in the day, Management by Objectives (MBO) was a popular theory of how to get the most out of the organization by getting the most out of every manager in the organization. Its fatal flaw was also a failure to recognize the inevitable but unintended consequences of optimizing the parts at the expense of the whole.
The way it worked — failed to work is a better way of saying it — was that, as the name implies, the company’s executives assigned each manager one or more objectives. Managers, given the improved clarity about what they were supposed to accomplish, set about accomplishing it with monomaniacal fervor, unimpeded by the distractions of what any other manager in the organization needed to accomplish their own objectives.
Modern organizations that suffer from what their inhabitants call “silo thinking” with their inability to collaborate are vestiges of the MBO era.
Helplessly helping the help desk
As someone once said — or really as just about every manager has said whenever the subject comes up — there are no perfect org charts. Deming’s optimization / sub-optimization principle is a key contributor to org chart imperfections.
Take the classic help desk and its position within IT’s organizational design. It has service-level targets for the delay between the first end-user contact and the help desk’s initial response; also a target for the time needed to resolve the end-user’s issue. Somewhere in there is also a goal of minimizing the cost per incident.
Figure that handling every reported incident includes time spent logging it, and either time spent trying to resolve it or time spent getting rid of it by handing it off to a different IT team.
The easiest way for the help desk to meet its initial response service level is to do as little as possible during the initial response, handing off every incident as fast as possible. This keeps help desk analysts free to answer the next call, and from getting bogged down trying to resolve problems they aren’t equipped to handle. Better yet, by directing problems to departments with more expertise, incidents will be resolved faster than if help desk analysts tried to solve them on their own.
Sadly, this approach also ensures help desk analysts never learn how to handle similar problems in the future. And while it also keeps the help desk’s costs down, it does so at the expense of distracting higher-priced talent from their current set of priorities, which, from the perspective of overall value, are probably more important.
Optimizing the help desk ends up as an exercise in unconstrained cost and responsibility shifting. The total cost of incident management increases in proportion to how much the help desk’s own costs decrease.
To optimize the whole, you have to suboptimize the parts. This guidance might not sound concrete and pragmatic, but don’t let its esoteric overtones put you off. If you want the best results, make sure everyone involved in delivering those results knows what they’re supposed to be.
Also that nobody will be penalized by collaborating to make them happen.