In a regression model, there will be an amount of variation that is unexplained by x, unless r squared equals 100%.

When you are working with a regression model, there is a chance that you will encounter an amount of variation that is unexplained. This variation, nevertheless, needs to be accounted for and checked to make sure that it lines up with the rest of the data.

Overview: What is the unexplained variation?

Unexplained variance is also known as residual variation or error variation. It refers to the random fluctuations around the regression line that show as variation. This type of variation is the sum of squared differences between each ordered pair’s y-value and the predicted corresponding y-value. It is the error portion of the regression equation.

Benefits and drawbacks of unexplained variation

There are both benefits and drawbacks to unexplained variation that are worth mentioning:

1. It can be beneficial if the unexplained variation shows consistency

If the unexplained variation is consistent, it can be beneficial in that it can still show if the data is following a relationship that is linear.

2. It can be a drawback if the unexplained variation is inconsistent

If the unexplained variation is inconsistent, it will inconsistently predict y values.

3. The higher the unexplained variance, the bigger the drawback

With a higher amount of unexplained variance, there is less ability for the model to explain the data variance.

Why is unexplained variation important to understand?

Unexplained variation is important to understand for the following reasons:

Discrepancy

Unexplained variation is important to understand as it acts as a measurement signifier for the discrepancy between the actual data and a model.

Finding the total variation

Understanding unexplained variation is an important aspect in being able to determine the total variation.

3. σ or σ2

Unexplained variation can sometimes be signified by the symbols σ or σ2. Understanding unexplained variation can help you know how to proceed with an equation or model when you encounter this symbol.

An industry example of unexplained variation

In working with a regression model, a statistician finds a startling amount of unexplained variation. In order to reduce it, they recheck their data to see if there was a mistake anywhere. Unfortunately, it appears that all the data is sound. With this information, all they can do is make sure that the unexplained variation follows a linear relationship and add it to the regular variation to find the total amount of variation.

3 best practices when thinking about unexplained variation

There are some practices that should be considered when working with unexplained variation:

1. Graph your residuals

Graphing your residual variation is a good practice since it helps you make sure there is a linear relationship to your data and that there are not just random outliers that do not follow a pattern.

2. Remove outliers

If you find that there are any data points in your residual variation that do not follow any linear relationship and are outliers that are too influential to the model, it may be necessary to remove them from your data.

3. Work towards improving reliability

You can attempt to check if unexplained variation can be reduced by trying to reduce measurement error or increase actual interindividual variability.

Frequently Asked Questions (FAQ) about unexplained variation

Is low or high variation better?

Lower variation is more ideal since it is more predictable.

How much unexplained variation is acceptable?

That all depends on outside factors determining what is acceptable. The higher the variation, however, the more likely it is that the validity of your data will be called into question.

How can you know if unexplained variances all have the same amount of variation?

You can know this by making sure that they are consistent across all value predictions.

Don’t Panic if You Have Unexplained Variation

Variation is normal in data, as is coming across unexplainable variation. The important thing is to check and make sure your data is sound and that the tools you used for measurement were used reliably. If you need to reinput to make sure, it isn’t the worst thing in the world. If it should turn out that you did all your steps correctly, hopefully, your unexplained variation follows a linear relationship. It may take some investigation to turn the unexplained variation into explained variation. Sometimes though, the unexplained will have to remain unexplained.

About the Author