Key Points
- Variable data is data that is measured.
- It can be derived from anything that you measure within your processes.
- It counts as a form of pseudo-continuous data, thanks in part to its smaller sample size.
In other articles, we’ve discussed discrete data, attribute data, and continuous data. Now it’s time to talk about variable data. Let’s look at what variable data is, contrast it with some of the other types of data, and suggest some best practices for dealing with variable data.
What Is Variable Data?
Simply put, variable data is the value you get when you measure something with a measuring device (scale, tape measure, stopwatch, etc.) that (1) can take on any value over a continuum of possible values and (2) can be logically subdivided given the resolution of the measuring device.
The term continuous data is used interchangeably with the term variable data. Some examples are weight, volume, time, length, and speed. All are measured, can take on any value, and can be logically subdivided into smaller and smaller units.
By contrast, discrete or attribute data are counted, not measured. You can have 5 people, 10 boxes, or 10 invoice errors. It makes no sense to talk in terms of 5.3 people or 10.636 errors on an invoice.
There is another condition whereby you take counts but can treat the data as something we call pseudo-continuous or variable data. This occurs when you have large counts that have a significant range of values, and the values are distributed across that range.Â
For example, if you wanted to know the average number of cases of a product produced in a day, you could count each day’s production for a month. Counting a case, by definition, would be discrete data. In most situations, that would be a large number. If the range of each day’s production was wide (more subjective than objective), and the values were distributed across the range of values, then you might decide to consider the data as a variable and use the appropriate variable data statistical tools.
Benefits of Variable Data
While you might not have a choice of the type of data you can collect, you should strive to use variable data as often as you can.
Sample Size
Variable data does not need as large of a sample size to provide a good understanding of the underlying distribution.
Resolution
By having the ability to subdivide the data into smaller and smaller logical values, you gain greater resolution, which will allow you to distinguish between values. If you could only measure in units of 1 foot, you couldn’t discriminate between 2 inches and 5 inches. Or between 4 inches and 8 inches. Everything would be 1 foot.
Allows
Using the probability distribution function (PDF), you can predict the probability of a single value occurring, or the probability of some value being larger or smaller than a value of interest.
For example, given the PDF of some normal distribution, you can calculate the probability that your processing time will be greater than 25, or less than 10, or between 10 and 25. You can also calculate the probability of your processing time being 21.
Why Is Variable Data Important to Understand?
Understanding the type of data you have is important because it determines the type of analysis you will do.
Correct Statistical Tools
The tools for statistical analysis are different for discrete and variable data. Using the wrong tool will result in misleading conclusions and decisions.
Correct Statistics
While all data distributions can be described by their center, spread, and shape, you use different statistical descriptors for discrete and variable data.
Cost Implications
Data costs money. You should consider the type of data you want to collect to maximize the value of the information and minimize the cost of obtaining that information. Variable data is generally better than discrete data if you have a choice.
An Industry Example of Variable Data
Steve, a warehouse manager, was required to take a weekly inventory of the cases of products on hand in the warehouse. As a trained Green Belt, he knew that case count was discrete data. He wanted to construct a control chart to monitor the variation of his inventory but wasn’t sure of which one to use so he asked his Black Belt, Bonnie, what she thought.
Bonnie suggested that Steve think about classifying his data as pseudo-variable, or continuous data, rather than a discrete count. She explained her thinking by pointing out that the count numbers were quite high (in the thousands) and that it ranged pretty wide from week to week.Â
Steve agreed and therefore chose the ImR chart to track his inventory levels.
Shown below is what his control chart looked like. He questioned what happened at point 31 when the inventory level went out of the upper control limit.
Best Practices for Variable Data
Data just doesn’t show up on your desk or your computer. You have to go out and collect it. Here are a few best practices for collecting and analyzing variable data.
Use Sample Size Formulas
There are a number of sample size formulas and calculators that will help you determine the minimum sample size you’ll need to achieve your desired degree of confidence and precision.
Use Statistical Software Where Possible
The days of hand calculations of complex statistical analysis are (fortunately) gone. When using computer software, be sure you select the correct functions that apply to variable data if that’s what you are analyzing.
Use Graphics to Supplement Your Analysis
Plotting the data will provide you with a visual look that can be a powerful directional indicator and give you some foresight as to what you might expect once you do your statistical analysis. Â
What Constitutes Variable Data?
Variable data can come in many different forms. One way to look at it is observable components of your current data, like an object’s weight or the speed at which machines operate. These don’t need huge sample sizes to determine, often being observed easily.
Other Useful Tools and Concepts
While we’ve exhausted what we can say about variable data, that isn’t the only thing to consider in LSS. You might want to review the role of process owners and how they factor into your processes and production workflow.
Additionally, you might want to take a closer look at the likes of interaction. This explores the relationship between multiple factors in your DOE and can reveal the impact of certain choices in your production.
Conclusion
We’ve defined variable data as the values derived from measuring things with a measurement device that can take on any value along a continuum of possible values and can be logically subdivided into smaller and smaller units.
If given a choice, you should try to use variable data to understand your processes and make decisions. The ability to gain greater resolution will allow for greater discrimination between the things you’re measuring. The statistical tools you can use for variable data are more powerful than those used for discrete data — plus, you won’t need as much data to do your analysis.