Thinking about prior distributions
Bayesian analysis requires priors. I see that as a huge
practical advantage, especially when producing information for
management: we can incorporate information derived from past
research over more than 100 years. The tradition in ecology,
however, is to use priors which give minimal information. Truly
'uninformative' priors do not exist - with the possible
exception of a uniform prior for a probability - so care is
needed to provide information which makes biological sense
within the context of your own analysis.|
As a colleague has pointed out, there are many specific types of priors described in the literature, and rarely a proper discussion of the reasons for choosing that particular distribution and those parameters. This is understandable with large complex models, such as multi-species occupancy models, with many priors and hyperpriors, but remains a problem. This post aims to give you some ways to explore possible prior distributions and assess their suitability for your analysis.
Stick to what you know
Set priors on quantities that have biological meaning. You will have an intuitive feel for the plausible range of values. They may be transformed - perhaps as a log or a logit - in the model, but set priors for the real values.
This applies specifically to standard deviations (SDs) for
normal and t distributions, where JAGS uses precision,
When is uniform informative?
Probabilities have to be between 0 and 1, so a uniform probability distribution is proper, and it should not be a informative, right? But once you transform your probabilities to the logit scale, they look very different:
On the logit scale our "uninformative" uniform prior becomes a really tight distribution with an SD of 1.8, which would be judged to be highly informative! For a prior on the logit scale, we'd want a normal prior with SD 5 or 10. Let's see what that means on the probability scale:
Now our minimally-informative, broad prior on the logit scale becomes highly informative - and nonsensical - on the probability scale. I can't think of an example where values near 0 and 1 are most plausible and those in the middle are least plausible.
In a logistic regression model the intercept is the logit-scale equivalent of the probability when all covariates = 0. With centred covariates that corresponds to something meaningful, and you should put a prior on the probability then convert to the logit form. For example:
Similar logic applies to log-transformed density or abundance: put a prior on the biological quantity and convert as necessary for the regression.
Look at the distribution you use
A key part of assessing the suitability of a prior distribution is to plot it. I also find it helpful to generate 20-30 random values from the distribution and plot them as a rug.
In a recent post on
cross-validation we used a half-Cauchy hyper-prior with
scale parameter 2.25 for the coefficients of
In the plot we see that most of the mass for the Cauchy distribution is below 5, but the tails allow for very large values - we have values in the thousands, which are surely preposterous. The t4 distribution gives much more sensible values.
t-distribution is coded in
|Updated 28 Oct 2019 by Mike Meredith|