Many information technology (IT) organizations are utilizing the Information Technology Infrastructure Library (ITIL) as a guiding framework for operational excellence. One of the many “best practices” within ITIL is utilization of service level agreements, (SLAs) and operational level agreements (OLAs). When implemented well, these agreements and associated management processes go far in establishing fruitful working relationships between IT providers and customers. If implemented poorly, however, they can be a source of discomfort for which IT providers should be wary.
A service level or operational level agreement is an agreement between an IT provider organization and its customer regarding the provider’s performance relative to some meaningful measurement. This is essentially the provider promising a specific process capability based on a particular metric. Normally, the agreement also contains various assumptions and conditions that must be met in order to achieve the agreed-to performance.
Committing to a Service Level Agreement
Surprisingly, when embarking on this best practice, some providers enter into agreements without fully understanding what they are able to deliver. The CIO of a major global financial institution stumbled into this lesson when he committed to an “average percent bad changes per month” of 18 percent in an SLA. (An average percent bad changes per month is the number of “bad” changes made divided by the total number of changes made, averaged over a calendar month with the data collected weekly.)
The 18 percent was derived from historical data that calculated the average bad change rate from the previous year. The intuitive logic – prevalent among many in the IT industry – was, “If that was my average across all of last year, I should have no problem getting below that this year. And to do that, I’ll stay below that average every month.” This metric and performance level target were put on the CIO’s performance plan. His operational approach was to review each core measurement monthly as “red,” “yellow” or “green.” Yellow was defined as exceeding the target by less than 1 percent, and red was defined as exceeding the target by 1 percent or more. Green was anything at or below the target. As part of the SLA, the status was shared monthly with the customer.
The first month’s score for this measurement was red. Characteristically, an enormous amount of energy was thrown at the situation, including long nights and heated conversations. The second month’s score was green and everyone congratulated each other on a job well done. The customer especially showed his thanks for the IT organization “pulling out all the stops” to fix the problem. The third month’s score also was green, confirming that everything was all right. But the fourth month’s score went red again, followed, of course, by another insertion of significant energy. This flip-flopping from red to green occurred for a total of six months until the CIO agreed to allow a Master Black Belt to look into the situation.
Reevaluating the SLA
The nature of all processes is that their outputs vary. Understanding this variation, in addition to averages, is one of Six Sigma’s core competencies. Accordingly, the Master Black Belt plotted the percent-of-bad-changes data on a control chart (Figure 1). Data was collected weekly and reported monthly. Green lines indicate monthly averages.
Each dot in Figure 1 shows the percent of bad changes that occurred during each week. The green lines indicate the average percent of bad changes for each month, which were reported on the CIO’s scorecard. Here is “process physics” in action, as there is significant variation around the target 18 percent. Middle managers and team leaders can probably appreciate the tension delivered into the organization whenever a week’s calculation was above the target level.
Continuing the investigation, the Master Black Belt decided to take a slightly different view by summarizing the monthly averages into another control chart (Figure 2). Each dot in Figure 2 is equivalent in value to the green lines in Figure 1.
This chart is a bit clearer in showing the monthly average percent of bad changes relative to the target of .18. It also shows that, after six months, the process had an accumulated average of .1857, or 18.57 percent. When overlaying the CIO’s red, yellow and green criteria (Figure 3), the reason for flip-flopping from red to green was both obvious and predictable.
As both the process owner and Master Black Belt understood, the measurement would continue behaving in this manner until there was some significant change with the processes’ systemic flaws. To that point, the organization had been “tampering” with the system, which, mathematically, could not lead to long-term success. Unfortunately for the CIO, committing to an average ignored the reality of process variation. Not only did the CIO’s commitment embarrass the company in front of a customer, it, along with his operational review scheme, made life extraordinarily difficult for his organization. So how could the situation have been different? How could an understanding of Lean Six Sigma have helped this organization?
Knowing Capabilities Before the SLA
A basic understanding of systems thinking and process behavioral math would have been beneficial before entering into the SLA. With this understanding, the CIO would most certainly have worded the SLA to reflect both the average and variation of the process – selecting a number within the upper control limit. Providing this information to the customer would have gone a long way toward managing customer expectations. Also, one could hardly expect proactive management through red-yellow-green dashboards. Control charts are just as easy to create and avoid all of the pain associated with chasing red unnecessarily.
After gaining some background on systems thinking, the CIO washed away the egg on his face and avoided permanent career damage. Not surprisingly, a new policy was instituted requiring all measurements to be charted and understood prior to SLA/OLA commitment – reasonable simulation and other Lean Six Sigma analysis was deemed acceptable if no prior data was available.
As the CIO discovered, simply implementing an ITIL best practice is not enough. Leaders must understand how to implement the best practices effectively within their unique circumstances. A solid Six Sigma program provides that understanding.