What if my data aren't normal?

Back to home page


Sometimes people I talk to are worried because their data aren't normally distributed, and they believe that they can't use the usual techniques such as t-tests or ANOVA without first transforming the data to be normal, or they must resort to non-parametric methods.

There are many good reasons for transforming data or NOT using t tests or F tests, but non-normal data is not usually one of them!

Suppose you think that male bats of the species Megaderma spasma may be bigger than females. You collect measurements of forearm length for a large number of adult bats and plot a histogram of the raw measurements:

Clearly the raw data are not normally distributed: indeed, they seem to be bimodal. But isn't that what we'd expect if there really was a difference between males and females? Let's plot the forearm lengths separately:

The data for each group are normally distributed, so a t test will be fine; there's a hint from the histograms that the variance for females is a little higher than for males, so maybe use Welch's version of the t test. Or, better, use BEST.

For ANOVA or linear regression, the assumption is that the residuals are normally distributed: the data do not need to be normal. If you are doing logistic regression for binomial data or Poisson regression for count data, normality is irrelevant. In no case is it necessary for the covariates (or predictors) to be normally distributed.

For more details, take a look at petrkeil's blog post here.

Updated 21 Feb 2013 by Mike Meredith