Introducing the saveJAGS package

HOME Credit R. Kikuo JohnsonHave you ever started a long JAGS run and wished you had some indication of progress or could peek at the results so far? That's possible with the saveJAGS package. It has certainly changed the way I do things. It's a wrapper for rjags which periodically saves the MCMC chains to files. The advantages of that are:
  • you see progress as the files accumulate on the disk,
  • you can load the output saved so far, run diagnostic checks and get preliminary results,
  • if there's a problem, or if you already have enough iterations, you can terminate the run early* without waiting for it to finish,
  • you can save long chains and all the parameters you might need in files, then load into your R session only selected parameters and thinned chains,
  • you no longer need to worry about "Cannot allocate..." memory errors when the JAGS run finishes,
  • and of course you still have your samples in case of power outages or other disasters, and you can resume the JAGS run almost where you left off.

The code is available on Github and you can install the package in R with:
devtools::install_github("mikemeredith/saveJAGS")

The package has tools to check and reconstitute the list of files, summarise the output available, and to load selected parameters into an mcmc.list object, and to resume the run and generate more samples. There is also a utility to convert an mcmc.list to a sims.list.

At present, saveJAGS will only run in parallel with only one chain per core/thread. It uses Pierre L’Ecuyer's random number generator, which is specifically designed for multi-core use but does not (at least in rjags) allow for reproducible chains.

An example

We'll use a simple occupancy model to demonstrate. This runs very quickly and doesn't really need parallel processing at all, but we can make it run longer by increasing thinning.

You will need at least 3 cores/threads on your machine to run this.

The data are the number of occasions (out of 5) that salamanders were detected at each of 39 sites:

sal <- rep(0:4, c(21,12,1,4,1))

The preparation is the same as for any JAGS interface: write the model code to a text file and prepare the data, initial values, and parameters to save.

modelText <- "
model {
  for(i in 1:nSites) {
    z[i] ~ dbern(psi)
    y[i] ~ dbin(p * z[i], n)
  }
  psi ~ dbeta(1, 1)
  p ~ dbeta(1, 1)
} "
writeLines(modelText, con = "JAGSmodel.txt")

JAGSdata <- list(y = sal, n = 5, nSites = length(sal))
inits <- function(chain) list(z = rep(1, 39))
wanted <- c("p", "psi", "z")

That's all the same; here's what's new:

We'll create a new folder/directory for the output files; I like to put a copy of the model file and the JAGS data in this folder too. Then we run saveJAGS, which takes about 3.5 seconds on my laptop. Increase sample2save or thin to get a better idea of how it works with a long run.

dir.create("mySaves")

res1 <- saveJAGS(JAGSdata, inits, wanted, "JAGSmodel.txt",
        chains=3, sample2save=1000, nSaves=4, burnin=1000, thin=10,
        fileStub="mySaves/testing")
str(res1)
summary(res1)

We ran 3 chains with 1000 iterations for adaptation and burn-in; saveJAGS does not have separate adaptation and burn-in phases, but allows adaptation to proceed throughout the burn-in phase. We saved 1000 iterations to each file after thinning by 10; this differs from most JAGS interfaces, where n.iter is the number before thinning. And we saved 4 files for each of 3 chains. In all, we ran 1000 * 10 * 4 * 3 = 120 thousand iterations.

Maybe need to repeat that: sample2save is the number after thinning and per file.

The output, res1, is a list of file names with class saveJAGSfileList.

Look in the "mySaves" folder and you will see files with names such as

testing_B_003_180401_0855.RData

Here "testing" is the file stub we specified, and this is the third set of samples from chain B. The date-time stamp gives the start time for this block of samples; look at the file's "Date modified" to see when it finished. The chains do not all run at the same rate, and files are written at different times; this can add up to 15-20 mins over 24hrs. Also file sizes reported by Windows are not identical, which I don't understand, they must have exactly the same number of bytes inside.

We don't need the output, res1, and can rebuild the file list from the folder. We don't even need to wait until saveJAGS has finished to build the file list and start the checks and analysis. The simplest is to start a new instance of R and run recoverSaves, or you can copy the files to another machine and work on them there. I now do this regularly for runs taking more than half an hour or so.

res2 <- recoverSaves("mySaves/testing")
all.equal(res1, res2) # TRUE

We can get an overview of the output with summary(res1), though this only opens one of the files and does not check the rest. We can load all the files to create an mcmc.list object with combineSaves(res1). If we have huge models with thousands of parameters (eg, 2 parameters for each of 158 bird species at 267 sites in Switzerland = 84,372 parameters) and long MCMC chains, we can run into memory issues in R. To avoid this, combineSaves allows you to load a subset of the parameters saved and to thin when loading:

mcmc1 <- combineSaves(res1) # default
str(mcmc1)
mcmc2 <- combineSaves(res1, params=c("psi","p"), thin=4)
str(mcmc2) # now without 'z', and 1000 iterations per chain instead of 4000
object.size(mcmc1) # 3945576 bytes
object.size(mcmc2) #   51024 bytes

A number of diagnostic tests and plots are available for mcmc.list objects in the coda package, but I often prefer to convert to a Bwiqid object and use the tools in the wiqid package:

library(wiqid)
bw <- as.Bwiqid(mcmc1)
bw
diagPlot(bw)
plot(bw)
plot(bw, "psi")

Another useful format is the sims.list, included in the output of most rjags wrappers. This can be created with the simsList function:

sl <- simsList(mcmc1)
str(sl)

See the help file for ?simsList for a more interesting example.

Resuming the JAGS run after an interruption

The files saved contain not just the MCMC samples, there's also the state of the model - the values of all the parameters and the random number generator - and details of the modules loaded and samplers active. No burn-in is required, though a few iterations are discarded for adaptation. If R was terminated brutally, you may find that some chains have more files than others and there may even be damaged files. These need to be cleaned up before resuming the run.

To resume the run, call resumeJAGS and pass it the file stub for the previous run and the number of additional files you want per chain. All the other information needed to relaunch JAGS is in the last files from the previous run.

newRes <- resumeJAGS(fileStub="mySaves/testing", nSaves=3)
summary(newRes)

Now we have 7 files for each chain, 4 for the first run and 3 from the second. And we can work with these in the same way as before and resume again to do more interations.

How it works

We use the fact that the rjags::coda.samples function can be run repeatedly to extract samples from a model, with the sample generation each time starting where it left off. Each worker extracts the stipulated number of samples as an mcmc.list with a single chain and saves it as an .Rdata file. This is repeated at the worker level until the required number of files has been written. Doing this at the worker level avoids having to recompile the model and start a new chain. On completion, workers return the file names to the master R process, and the list is returned to the user.

Still to do

  • More examples and unit tests, then submit to CRAN.
     
  • More 'sanity checks' and more informative error messages if there is a problem.
     
  • Profiling to optimise speed and memory use - well, memory use anyway: if the JAGS run took 5 hrs, waiting 10 secs to load the output isn't a problem.
     
  • Provide functions to load output from files directly into sims.list, Bwiqid or matrix formats without first creating an mcmc.list object, thus reducing memory demands.
     
  • Provide a light-weight alternative to coda::gelman.diag, as that seems to hang or even cause R to crash with large numbers of parameters; it should report only parameters with high Rhat rather than produce a huge list; similarly a wrapper for effectiveSize that picks out the problem parameters; cf the bigCrossCorr function in AHMbook package.

 

* Terminating the R processes: Note that 'Esc' in R ends the master process in the Console, but it does not stop the workers. They will continue, zombie-style, until they have finished their assigned task, even if you close R. So you still have to clean up.

To stop the workers, use End task in Task Manager on Windows, or Force Quit in Activity Monitor on a Mac. Or reboot the computer.

Updated 8 April 2018 by Mike Meredith