Understanding mean time between failures (MTBF) is a way of expecting the unexpected. Every kind of physical tool, machine or device will eventually fail. Friction of physical parts rubbing against each other, degradation of certain materials or misalignment between different components can all contribute to system failures. That’s why every active car needs tune-ups and maintenance no matter how well the owner takes care of it.
Overview: What is mean time between failures?
The most basic definition of mean time between failures is a mathematical one. Typically, this measurement is derived by dividing the total number of uptime hours by the number of system failures. Despite the apparent simplicity, actually using this equation and applying its results effectively isn’t always straightforward
In order for it to make sense, businesses need to really understand their specific definition for “uptime” and “failure.” There are many types of breakdowns or partial failures that may only compromise system function a little bit. Companies need to understand their processes and equipment enough to know what kind of disruptions are tolerable and what kind are unacceptable in different circumstances.
3 benefits of MTBF
The power of metrics like MTBF is in the perspective they provide. They are how leaders keep their finger on the pulse of many different processes across their company.
1. Sensitivity to volatility
One of the most immediate benefits of monitoring mean time between failures for your physical systems is to manage volatility. The entire basis of Six Sigma is to reduce variability in the results of business processes, leading to a more controlled and efficient workflow. This metric is perfect for that purpose.
2. Planning for failure
It’s virtually impossible to predict when and how specific breakdowns will occur in mechanical systems. Different parts may give way at unexpected times due to small fluctuations in the machine’s operation. Calculating MTBF allows companies to make educated guesses about future maintenance needs even though they can’t predict specifics.
3. Simple benchmarks
Another reason to care about this metric is its use as a basic benchmark. When comparing similar systems or equipment with those owned by another company, business leaders can readily gauge their performance in certain key areas.
Why is mean time between failures important to understand?
Like any kind of self-generated metric, leaders need to invest quality input if they want quality output.
Figuring out failure – Some kinds of failure are obvious. If a machine suddenly starts spewing smoke and shuts off, it has clearly failed. However, if the same machine starts making weird noises and is going about 5% slower, it’s not as clear if this time counts as “failure” or “uptime.” Actually defining failure depends on how breakdowns impact value-added activities and the overall quality or efficiency of the result.
Useful in context – Like any kind of business metric, MTBF is not a useful number on its own. Building data about equipment failures can provide amazing insight into your own processes, but only if its supported with further investigation, additional data sources and real action.
Inherent and instigated failure – It’s important to understand that mean time between failures is only meant to describe disruptions that arise from the machine’s own operations. Inherent failures are inevitable in any system and usually take many different forms. Instigated failure, which would be any scheduled or deliberate disruption of operations, should not be included in the calculation.
An industry example of MTBF
A fast food restaurant serves ice cream for 10 hours a day, from lunch at 12 until their closing time at 10PM. This means there is a maximum uptime of 10 hours a day or 70 hours a week. However, despite the maintenance done to the machine before daily operations, it’s still possible for the equipment to fail during the shift.
Since the establishment serves fast food, the machine needs to be able to generate an entire serving within a minute to satisfy speed requirements and be able to do so again immediately after if another customer happens to order. With this in mind, leaders can say that the equipment has officially “failed” if it cannot maintain production of at least 1 serving of ice cream per minute.
To make the actual calculation, the owners must measure uptime versus downtime. In the first week, the machines were down for 8.5 total hours of the 70 hours they should have been working. This means their total uptime was 61.5 hours. There were also a total of 7 individual failure events associated with this lost time.
This means they need to divide 61.5 hours by 7 to get their final MTBF number, which is 8.8 hours in this case.
3 best practices when thinking about mean time between failures
You should always follow basic best practices when collecting or utilizing data in process management to avoid common pitfalls.
1. Coordinate maintenance
One of the best ways to utilize this metric is to establish a proactive system and framework for effective maintenance. Estimating the rate of breakdowns in different systems helps companies optimize how they deploy personnel and assets.
2. Planning for failure
Sometimes it’s better to plan for failure rather than fight it. Inherent failures and maintenance needs are inevitable in any physical system or device, so it’s something companies need to accept, embrace and integrate into their process cycle.
3. Continue to improve
MTBF isn’t just a mechanism for aiding in continuous improvement, it should also be the subject of this improvement. There are almost always ways you can improve the methodology and scope of data collection. It’s hard to have too much information when you want to truly take control of your processes.
Frequently Asked Questions (FAQ) about mean time between failures
What’s the difference between MTBF and MTTR?
Mean time between failures (MTBF) is closely related to mean time to repair (MTTR), but they aren’t technically the same. MTTR describes the average interval between repair work done on a machine.
How do you calculate MTBF?
Mean time between failures is the result of dividing total uptime by the number of failures. Total uptime is the result of subtracting downtime from total possible uptime.
What is MTTF?
Mean time to failure (MTTF) is an alternate acronym that means the same thing as MTBF.
Rates and rotations
Almost everything in business is cyclical. Some cycles repeat dozens of times every day, while others turn slowly over the course of years. In either case, MTBF is one of the many metrics that help individual people step back and take in bigger picture prospects with new perspective, scope and context.