Burn... or adapt?

HOME Most runs of MCMC chains for Bayesian estimation begin with an adapt phase, when the samplers used to produce the chains are tuned for best performance. I recently did a long run for a complex model, and the output was flagged with "convergence failure". In fact the problem was caused by cutting short the tuning or adapt phase. The default adapt phase for the R package jagsUI is only 100 iterations, which was woefully inadequate for my model. We should be specifying big numbers for n.adapt and small numbers, or even zero, for n.burnin.

I was running a model for Kéry & Royle (2016) section 11.7.2 with JAGS from R via the jagsUI package. I set n.burnin = 5000, n.iter = 15000 and n.thin = 10. I didn't change the default value of n.adapt, which is 100. After 6 hours, I got a result flagged with "convergence failure" and Rhat values > 1.1. Many of the trace plots were ugly, like the first three shown below:

Looking at the trace plots, it's clear that the problem isn't convergence - the seven chains shown are all in the right part of the parameter space; increasing burn-in will not improve matters. But we have poor "mixing": the chains are moving in tiny steps, taking many iterations to cover the full range. Autocorrelation is huge even after thinning, with some effective sample sizes (n.eff) < 20 (from 70,000 generated samples); we'd have to do millions of iterations to get adequate effective sizes.

I repeated the run with n.adapt = 5000 and n.burnin = 100 and the other settings unchanged. Now all the Rhat values were close to 1 and the smallest n.eff's were in the hundreds. The trace plots looked much better:

In fact, I got 1500 values out per chain (after thinning by 10): I had forgotten that n.iter includes burn-in but excludes adaptation, and left it at 15,000.

Why is it a problem now?

We have been using MCMC since the first BUGS software was released back in 1989, but users have not paid much attention to adaptation, focusing on burn-in and chain length.

WinBUGS and OpenBUGS take care of adaptation, and will not let you set Monitors to extract the chains until internal tuning criteria have been met. Adaptation was a check-box in the corner on the Update widget, which we likely only noticed when we tried to set Monitors and couldn't because it was still checked... and we couldn't uncheck it, we just had to run more updates. Those programs silently (sneakily?) used the early burn-in iterations for adaptation.

JAGS separates out the concepts of adaptation and burn-in, and allows the user to terminate adaptation before the tuning criteria are met. Turning off adaptation too soon is now a real danger.

A sensible strategy would be to use all the burn in iterations for tuning. This is adopted by the R2jags package, where the value you specify for n.burnin is actually passed to JAGS as the number to use for adaptation. But if none of the samplers in use needs tuning, JAGS skips adaptation, and then you have no burn in... and no way to specify burn in.

The jagsUI package allows you to specify adaptation and burn in separately, but most users are still thinking "burn in" and ignore the n.adapt argument. The default is only 100 iterations, which is too small for all but the simplest models. Certainly too small for the model I was trying to run!

Recommendations: With jagsUI functions, use the n.adapt argument to set the number of burn in iterations. You can usually leave the n.burnin argument with its default value of zero. And always check the trace plots, at least for the main structural parameters in the model; don't rely entirely on Rhat.

Reporting

Since we first started using BUGS software, the reported burn-in iterations have included the adaptation phase, so no major change is needed to reporting styles. So I might report the second analysis with the phrase: "I retained 1,500 values per chain, after discarding 5,000 for adaptation and burn in and thinning by 10."

Reference

Kéry, M. & Royle, J.A. (2016) Applied Hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in R and BUGS. Elsevier, Amsterdam etc.

Updated 27 Dec 2016 by Mike Meredith