Most software and IT organizations have great difficulty measuring organizational efficiency and effectiveness, despite a bewildering array of metrics that have been proposed and occasionally used. However, a basic-yet-powerful set of metrics that gets to the heart of these issues does exist, and at the same time facilitates the application of Six Sigma.
It is measuring effectiveness from the perspective of execution that is the objective here. The concerns addressed are cost, quality and cycle time.
Where Does the Time Go?
Labor, including both staff and contractors, is nearly always the largest element of cost in software or IT organizations. Hence, understanding and improving the use of labor is fundamental to any improvement strategy. Many software/IT groups have systems and processes in place to track and report labor use against a multi-level chart of accounts that may include dozens, hundreds or even thousands of charge categories. A typical size chart of accounts for applications development and support is illustrated in the following outlines:
Applications Development
Line of business
Project (dozens to hundreds)
Life cycle phase (e.g., requirements, design, etc.) (5 to 10)
Task (tens to thousands)
Applications Support
Line of business
Application (dozens to hundreds)
Activity (e.g., defect fix, question, data correction, etc.) (perhaps 10)
At first glance, one might think that an organization with this much data would be well positioned for Six Sigma – not! Here is why not:
1. Reliability of the data is inversely proportional to the number of charge categories. More categories mean more entries for each individual reporting and more time to do the reporting. Most individuals will enter times weekly or even less often, and cannot accurately remember how time was actually spent. Many individuals will do development and support activities in the same week, and work on several development projects, including several phases/tasks within each. Commonly one person will spend time on dozens of different charge categories.
2. Many de-facto incentives exist that distort the data. To mention just one, it is common practice to stop charging time to a project when the budget has run out. Instead, time is charged to something that has remaining budget, independent of the actual work being performed. Similar distortions occur between phases. The infamous “hear no evil, see no evil” game prevails. No one really wants to know the truth.
3. There is usually no feedback to individuals on their time reports. Typically no decisions result, hence, widespread cynicism becomes entrenched: “Nobody uses this data, so who cares?” Non-compliance and distortion become the norm.
Less Is More
From an efficiency improvement perspective (not necessarily from a project management perspective), task-level time detail is not useful. That is because such data is not comparable across projects. What is common to all projects, and far more useful for measuring effectiveness, is a cost-of-quality view of time expenditures. For software and IT activities, the measurement system can be a simplified three-part cost-of-quality scheme consisting of appraisal, rework and value-added. In many instances, this approach means less time reporting and fewer items in the chart of accounts.
1. Appraisal is all time spent finding defects. In most organizations appraisal is primarily testing, but may also include inspections and reviews of various work products prior to testing. Most organizations devote 30 percent to 50 percent of total effort to appraisal, but usually have no idea if that is enough or too much. Testing usually stops when time runs out, with little or no insight into the effectiveness of the effort expended. The essential effectiveness metric for all types of appraisal is cost per defect (or total cost of the appraisal effort divided by the total number of defects found). A company should understand this metric for each different type of appraisal so it can choose the most cost-effective combination. In practice, the early appraisal such as inspections, typically has significantly lower cost per defect, commonly three to 20 times less than testing. (Defect tracking, a second essential source of metrics, is addressed below.)
2. Rework is all time spent fixing defects. It includes defects found by any form of appraisal and/or by customers after a system is delivered. Typically, 30 percent to 50 percent of total effort is spent on rework, but it is rarely measured. Most organizations cannot separate appraisal effort from rework. The essential effectiveness metric for rework is cost to fix a defect (or total cost of rework divided by the total defects fixed). It is important to remember that all work done to fix an application after it has been delivered to a customer is rework. That may include corrections to features or functions that are incorrect, but also may include “missed requirements” – things the customer expected but did not receive.
3. Value-added is simply the total time spent minus time spent on appraisal and rework. Improving effectiveness is increasing value-added. Any and all improvement initiatives can set goals and measure improvements in terms of the impact on value added. When measured, which rarely occurs, value-added is commonly about 30 percent of the total effort. When cancelled projects and projects that were delivered but not used are considered, value-added may be even less. Value-added is the central big Y for all software and IT activities.
It should be noted that this definition of value-added is offered for the sake of “operational definition” simplicity. It does not exclude the possibility that some effort categorized as value-added may be redundant or unnecessary. However, the elephant needs to be eaten one bite at a time. When a company gets value added to 50 percent, it can begin to consider refinements of these categories. Getting these three cost-of-quality categories implemented will provide plenty of challenge for at least the first year.
Understanding Defects
Tracking defects means recording every defect found by a customer or by any formal appraisal process. That does not mean asking individuals to track every change – unit tests, individual code reviews and walkthroughs are usually not subject to tracking. Once an author declares a work product “complete” however, and releases it for independent appraisal by others, all defects found should be tracked.
Every defect found should be identified with the following information:
- Work product in which the defect was found. (This may need to be determined by the person doing the fix.)
- Appraisal method used to find the defect – inspection, type of testing, customer, etc.
- Origin of defect (i.e., where it was “inserted” – requirements, design, coding, etc.) (It is not always possible to determine the origin, but an adequate sample is usually feasible.)
- Project ID and life cycle phase (if during the project) or application ID in which the defect was found (if after release).
- Other information necessary to track “assigned to,” “status” and “closing information.” (This is not necessarily needed for process improvement, but typically is required for management purposes.)
- Defect type – a short list of orthogonal categories that make it possible to determine which type of defect is most effectively found by which type of appraisal.
Tools such as a defect cost scorecard, defect containment effectiveness (DCE) and total containment effectiveness (TCE) metrics can be applied to manage differential effectiveness across phase of origin and detection.
Understanding ‘Size’
Chief information officers and others often claim that their actual-to-estimated effort and schedule variance is only 5 percent or 10 percent. Try as one may to keep a straight face when hearing that, those in the know in the software/IT business usually grin in spite of themselves. Independent industry evidence (e.g., the Standish Group Chaos reports) consistently indicates that average project overruns are more like 100 percent. The secret ingredient to reconcile these divergent perspectives is a measure of size – as estimated, as delivered.
Meaningful assertions about variance must include normalization for changes in size. A project that cost twice as much as planned, but delivered twice as much functionality, should be understood to have had zero variance. Similarly, a project that was on budget but delivered only 50 percent of the intended functionality should be understood to have had a 100-percent variance. Few organizations today understand real variance.
Putting It All Together – An Effectiveness Dashboard
The following indicators make a good effectiveness dashboard. They are intended to be used in three perspectives – baseline (values at the start), trend (changes in values over time, reflecting aggregate impact of all interventions) and pre/post intervention (reflecting the impact of a specific change or improvement under reasonably controlled conditions, making an effort to isolate individual effects).
- Appraisal cost per defect by phase and appraisal type (by project and in aggregate)
- Rework cost per defect by phase and appraisal type (by project and in aggregate)
- Value-added, appraisal and rework as a percentage of effort (by project and in aggregate)
- Defect containment effectiveness (by project and in aggregate)
- Total containment effectiveness (by project and in aggregate)
- Effort variance normalized for size (by project and in aggregate)
- Schedule variance normalized for size (by project and in aggregate)
- Defect density, or defect per size – total “insertion” rate (by project and in aggregate)
- Effort per size, or productivity (essential to consider variations in “schedule pressure”)
- Duration per size, or cycle time (essential to consider variations in “schedule pressure”)
Based on experience with sustained application of these metrics, it is typically possible for an organization to shift 10 percent to 20 percent of non-value-added work to value-added within one to two years. It is not easy, but the payoff potential is very large.