In your regression model, it is vital that you be able to trust the data. If you are finding that independent variables are correlating with one another, you may need to address the issue so that you can make sound interpretations of the model.
What is multicollinearity?
Multicollinearity is a phenomenon where one or more independent variables are highly correlated with other independent variables in a regression model. This can call into question the reliability of the statistical inferences as the variables should be independent of one another.
3 drawbacks of multicollinearity
There are some major drawbacks that multicollinearity can have on your regression model that you should be aware of:
1. Weakening of statistical power
Multicollinearity causes a reduction in the precision of estimated coefficients. This winds up weakening how statistically powerful your regression model is.
2. P-values become untrustworthy
The effects of multicollinearity make it so that your P-levels may not be able to properly identify statistically significant independent variables, making the P-levels untrustworthy.
3. Co-efficient sensitivity
Dependent on whatever other independent variables are in your regression model, the coefficient estimates could be all over the place. Even small changes to the model could cause major changes to the coefficient estimates.
Why is multicollinearity important to understand?
If you are working with regression, multicollinearity is important to understand for the following reasons:
Certain kinds of regression do not work as well with multicollinearity.
Understanding multicollinearity and its effects are important because some kinds of regression do not work as well with it. One example would be stepwise regression.
Analysis of coefficients
One major reason for understanding multicollinearity is that an analysis of multicollinearity should likely precede the analysis of your regression coefficients.
Understanding how much is too much
Having an understanding of multicollinearity gives you the tools to know when there is too much of it in your regression model or whether you can work around it. There are some cases where even if there is a lot of multicollinearity, it will not affect what you are trying to glean from the data. Knowing this is important and will potentially save you a lot of potential time and frustration.
An industry example of multicollinearity
A shop wants to look at customer loyalty with a model that lists various measures of satisfaction. Two of the measures of satisfaction are found to be highly correlated. The amount of multicollinearity is deemed problematic. It is determined that the reason why this multicollinearity is occurring is because the two measures are actually part of the same category of satisfaction and it would be better to combine them. This fixes the multicollinearity issue.
3 best practices when thinking about multicollinearity
Multicollinearity does not need to make your regression model unusable or leave you with the need to make heavy adjustments. Here are some things to keep in mind so that multicollinearity need not be a major issue every time it comes up:
1. Problems increase with more multicollinearity
The degree of multicollinearity has an effect on how much of an issue you will have. If you do not have severe multicollinearity, you may not need to fix it.
2. Does multicollinearity affect the variables you are interested in?
Since multicollinearity only has an effect on the independent variables that are correlated, you may not need to make adjustments if the specific independent variables that you are interested in are unaffected.
3. Predictions
Multicollinearity does not affect predictions or the precision of predictions. So, if your primary goal is related to predictions and it is not important to you to understand each independent variable’s role, there is no need to reduce even severe multicollinearity.
Frequently Asked Questions (FAQ) about multicollinearity
How do I deal with multicollinearity?
There are a few solutions for dealing with multicollinearity if it appears that you cannot work around it:
- The removal of some of the most highly correlated values may help.
- Combine the independent values linearly. One way to do this would be to add them together.
- Conduct an analysis such as a partial least squares regression or a principal components analysis.
Just remember that if you do decide to remedy the multicollinearity, each solution has downsides.
What is the best method for identifying multicollinearity?
Using a VIF or variance inflation factor is a simple method for identifying multicollinearity in your regression model.
How much multicollinearity is too much?
Anything under 60% is considered acceptable for some people. For others, even 30% to 40% is considered way too high. The reason for this is that even one variable could feasibly exaggerate the model’s performance or significantly mess up parameter estimates.
Multicollinearity and your regression models
When you see independent variables correlating with one another, there could be a multicollinearity issue. This does not necessarily mean that you need to resolve it though. Sometimes the amount of multicollinearity is not an issue for the results you are looking for. If it does not resolve, however, thankfully there are ways to resolve it.