Key Points

  • Skewness measures the symmetry of your data.
  • It can indicate whether or not your data adheres to a normal distribution.
  • It’s okay for your data to have skewness, it can indicate outliers in your data set.

If your data demonstrates skewness, it’s not a good or bad thing. It is the shape of your data. This article will discuss the different types of skewness and what it means for your data. 

What Is Skewness?

Skewness is a measure of the symmetry of your data distribution. A distribution is symmetric if it looks the same to the left and right of the center point. 

For example, the normal distribution is a symmetrical distribution where the three measures of central tendency (mean, median, and mode) are all the same, and half the data falls left of the center and half to the right. A symmetrical distribution will have a skewness value of zero. 

You can see this in the graph below.

For a single attribute distribution, the formula for skewness, known as the Fisher-Pearson Coefficient, is the third standardized moment around the mean:

Skewness formula

Negative or positive values indicate the direction of the tail. Negative, or skewed left, means the left tail is long relative to the right tail and points in the direction of zero and negative. 

Positive, or skewed right, means that the right tail is long relative to the left tail and points to higher positive numbers. 

Some measurements may have a natural lower bound and will naturally skew right. For example, time can be infinitely long but can’t be less than zero…at least not without a time machine. Outliers can also impact your distribution resulting in skewness.

The relative values of the mean, median, and mode will indicate skewness. In a symmetrical distribution, they will all be equal. 

In the graph below, you can see the relative positions of the mean, median, and mode depending on the direction of the skewness. By observing just the descriptive statistics of each, you will be able to visualize the degree and direction of the skewness. This is why it is recommended you don’t rely on the average of your data alone without comparing it to the median and looking for outliers. When combined with the actual value for skewness, you can describe the nature of your distribution.

Relative values of central tendency

Benefits of Knowing Your Skewness

Skewness is not necessarily an anomaly in your data. It may be a function of the nature of the characteristic you are measuring. Here are some benefits of knowing what your skewness means.

Existence of Outliers

A distribution may be skewed as a result of an outlier. If so, you will want to determine if that outlier is the cause of the skewness. If so, you may have a symmetrical distribution and can make the appropriate decision about your data.

Easy to Compute and Visualize

Most computer programs will compute the skewness value. The closer to zero, the more symmetrical your data. Negative or positive values will indicate in what direction you should look for an explanation of your skewness. A histogram will give you a visual picture of the data where any skewness should be easily seen.

Provides Insight Into Your Data

If skewness is indicated, you should review your data to understand the cause of the skewness. If something unexplained or unexpected occurs, you can take the appropriate actions. 

Understanding Positive and Negative Skewness Values

So, what does it mean if your skewness has a positive or negative value? Simply put, that is where your distribution tail ends. A negative value indicates left skewness, while a positive value indicates right skewness.

Why Is Skewness Important to Understand?

Since your data is a reflection of your process, understanding the reasons for skewness will help explain your process. 

Understanding How Your Data Is Distributed

Not all distributions should be assumed to be symmetrical. Skewness will help you understand how your data is distributed. 

Normal Distribution

Not all symmetrical distributions are normally distributed. But, since the normal distribution is an underlying assumption of many statistical tests, you can use your skewness value to understand whether your data is at least symmetrical. If not, your data will not be normally distributed, and you will fail the normality assumption if your test requires adherence to that assumption. 

Use for Prediction

Many people use the average to predict or make projections. But, if your data is skewed, the average may not represent the true central tendency of the data. You may be better off using the median unless there was a specific cause for the skewness and you took corrective action to revert to a symmetrical distribution.

An Industry Example

The facilities manager of an office building was reviewing the maintenance records for the eight elevators and noticed the average downtime for an elevator was 3 hours. 

He was quite upset about the inconvenience this could be costing the building’s tenants. He was about to fire the elevator maintenance company when one of his staff suggested he make a histogram of the data and look a little deeper into the data. 

This is what the histogram looked like:

It became obvious the average of 3 hours was not representative of the true process. The mean was skewed right by the number of unusually high downtimes due to the lack of available replacement parts. The median of 1.75 hours was more indicative of the true performance of the elevators.

Action was taken to stock a higher number of the parts that took longer to procure so the longer downtimes could be eliminated.

Skewness Best Practices

Here are some tips on how to best utilize information about your data and any skewness you may have.

Plot Your Data

Graphical plots like a histogram or dot plot will give you a quick visual of the distribution.

Look for a Special Cause

Answer the question of whether the skewness is a natural condition of the data or due to some special cause like outliers or a multi-modal distribution. 

Don’t Use the Mean if the Distribution Is Too Skewed

The median may be a better measure of central tendency since the mean can be distorted by skewness.

Other Useful Tools and Concepts

Looking for some other tools to get you going? You might do well to learn about how to achieve process stability. When you’re looking at any data source, it can clue you into how your process is performing. Equipping yourself with the right strategies is setting yourself up for success.

Further, you might need to learn how entitlement works within the context of process improvement. Entitlement represents the highest achievable output in your processes and can be a pleasant side effect of stable and capable processes.

Conclusion

The symmetry of your data distribution is measured by skewness. A perfectly symmetrical distribution will have a skewness value of 0; mean, median and mode values will be the same, and half your data will fall to the left of the center of your distribution and half to the right. Skewness can result from a data outlier, or a natural upper or lower bound to your data. 

There are two easy ways to determine whether your data is skewed quickly. First, plot the data on a histogram or dot plot. The second way is to compare the values of your mean and median. If they are relatively different, it means your mean has been distorted. 

When your mean is higher than your median, you have a positive, or right skewness. If it is less, you have a negative or left skewness.

About the Author

Follow Me On:

LinkedIn Logo