"Partial pooling" and hierarchical models

HOME How does hierarchical modelling help in getting sensible estimates for cases where very few data are available? We'll explore that here, using the concept of "pooling" data from multiple cases.

Suppose we set out camera traps in our study area with the intention of using the proportion of days when we detected our target species as an index of abundance. This is sometimes called the Relative Abundance Index, RAI; I'm not sure how useful it is in general, but let's assume it works for the species we are interested in. Our toy data set has only 9 cameras, and they were put out for 2 months. Four of the cameras operated properly for all 60 days, but four of them failed after only a few days and one didn't work at all. Can we draw useful inferences about the sites with hardly any data?

Here are the results, with the number of nights operational, n, number of nights with detections, y, and a naive estimate of RAI = y/n.

n <- c(60, 60, 60, 60, 7, 6, 4, 3, 0)
y <- c(42, 38, 21, 19, 3, 5, 0, 2, 0)
names(n) <- LETTERS[1:9]
cbind(n=n, y=y, R=round(y/n, 2)
   n  y    R
A 60 42 0.70
B 60 38 0.63
C 60 21 0.35
D 60 19 0.32
E  7  3 0.43
F  6  5 0.83
G  4  0 0.00
H  3  2 0.67
I  0  0  NaN

Based on this, we can be pretty sure that abundance is higher at sites A and B than at sites C and D. But what do we make of sites E to H? We don't have enough data to say very much; in particular, it would be rash to conclude that the species was absent (abundance = 0) at site G based on just 4 nights of data. And we can't say anything about site I.

Pooling the data

One way to get around the problem of sites with few data - or no data at all, as with site I - is to pool the data from all the sites. In this case we have a total of 260 trap nights with detection on 130:

sum(y)/sum(n)
[1] 0.5

So now we conclude that RAI = 0.5 for all the sites. Hmm, well, that solves the problem for site I, and it may be a better estimate than for the unpooled data for E to H, but it ignores the clear differences among sites A to D. This is not a good solution.

Partial pooling

We need a solution that takes account of the overall estimate, 0.5, but also makes use of the information in the data we have, whether it's a lot or just a little.

The first 4 sites have lots of data. The high unpooled estimates for A and B might be slightly too high and the low estimates for C and D slightly too low, but we don't want to change them very much.

We have few data for sites E to H, so the estimates there should be close to the overall estimate, but still we'd want it to be a bit higher of F and H and a bit lower for E and especially G.

And for site I, we'd go with the overall estimate, as it's the only information available.

That's nice in principle. Let's see how we calculate defensible estimates.

Fitting the models

You shouldn't be surprised that calculating estimates of parameters involves fitting models! And we'll do this in a Bayesian framework using JAGS.

No pooling

Without resorting to pooling, we can't say anything about site I, so we fit the model to just the first 8 sites. Here's the JAGS code:

modeltxt <- "
model{
  for(i in 1:8) {
    # observation model
    y[i] ~ dbin(R[i], n[i])
    # prior for each R[i]
    R[i] ~ dbeta(1, 1)
  }
}"
writeLines(modeltxt, con="nopooling.jags")

We are treating each trap night as a trial with a success if the species is detected. The estimated RAI is simply the probability of success, R, for which we use a uniform prior.

We run the code (we don't need to supply initial values) and look at diagnostic plots:

jdata <- list(y=y, n=n)
wanted <- "R"

( outnp <- jags(jdata, NULL, wanted, "nopooling.jags", DIC=FALSE,
  n.burn=100, n.iter=1100, n.chains=3) )
wiqid::diagPlot(outnp)

The diagnostic plots look ok, and here are the results:

      mean    sd  2.5%   50% 97.5% overlap0 f  Rhat n.eff
R[1] 0.694 0.059 0.574 0.696 0.801    FALSE 1 1.003   853
R[2] 0.628 0.061 0.507 0.629 0.744    FALSE 1 1.002  1378
R[3] 0.354 0.060 0.241 0.352 0.478    FALSE 1 1.002  3000
R[4] 0.321 0.057 0.214 0.320 0.437    FALSE 1 1.002  1667
R[5] 0.440 0.157 0.149 0.436 0.742    FALSE 1 1.003   720
R[6] 0.753 0.144 0.443 0.776 0.968    FALSE 1 1.000  3000
R[7] 0.163 0.141 0.004 0.123 0.526    FALSE 1 1.009   277
R[8] 0.590 0.206 0.171 0.603 0.926    FALSE 1 1.003  1185

The means are all pretty close to the y/n estimates, except for R[7], site G, where y/n = 0/4. Check the diagnostic plots for R[7] and you'll see that the density peaks at 0 and then declines rather slowly; you could easily get a result of 0 out of 4 with true R = 0.2.

Total pooling

With total pooling (also known as "complete pooling"), we just estimate one value for the RAI which (we assume) applies to all the sites. We create the JAGS model for that, where we now include all nine sites, and then run the model using the same data file:

modeltxt <- "
model{
  for(i in 1:9) {
    # observation model
    y[i] ~ dbin(R, n[i])
  }
  # prior for R
  R ~ dbeta(1, 1)
}"
writeLines(modeltxt, con="totpooling.jags")

( outtp <- jags(jdata, NULL, wanted, "totpooling.jags", DIC=FALSE,
  n.burn=100, n.iter=1100, n.chains=3) )
wiqid::diagPlot(outtp)

  mean    sd  2.5%   50% 97.5% overlap0 f Rhat n.eff
R  0.5 0.031 0.439 0.501 0.559    FALSE 1    1  3000

Partial pooling

We need to have separate R's for each of the sites, but each R must take account of what's happening at the other sites too, so the model is a good deal more sophisticated. In fact, it's a multi-level or hierarchical model. Let's show the JAGS code and then discuss the details:

modeltxt <- "
model{
  for(i in 1:9) {
    # observation model
    y[i] ~ dbin(R[i], n[i])
    # prior for R[i]
    logitR[i] ~ dnorm(mu, tau)
    R[i] <- ilogit(logitR[i])
  }
  # hyperpriors for mu and tau
  Rmean ~ dbeta(1, 1)
  mu <- logit(Rmean)
  sd ~ dunif(0, 10)
  tau <- sd^-2
}"
writeLines(modeltxt, con="parpooling.jags")

Instead of each R being drawn from it's own Beta(1, 1) prior, the R's have a single, informative prior, which contains information from all the sites - the overall RAI and the variation among sites. There are various ways to set up this single prior, but since we are familiar with logit models, I used a normal distribution for logitR which is then converted to R with ilogit. The normal distribution has mean mu and precision tau, the same for all R's.

So what are the values for mu and tau? We don't know, but we can estimate them from the data: this is another "higher" level model. But we need priors for this model, sometimes known as hyperpriors because they are priors for priors. The overall RAI, Rmean, is a probability, so I use a Beta(1,1) prior for that, then get mu from logit(Rmean). I use a uniform prior for the spread (sd) of the R's then convert that to precision, tau.

We use the same data but we want to extract Rmean and sd. Mixing is not so good with hierarchical models, so I increased the number of iterations.

wanted <- c("R", "Rmean", "sd")

( outpp <- jags(jdata, NULL, wanted, "parpooling.jags", DIC=FALSE,
  n.burn=100, n.iter=10100, n.chains=3) )
wiqid::diagPlot(outpp)

       mean    sd  2.5%   50% 97.5% overlap0 f  Rhat n.eff
R[1]  0.683 0.059 0.562 0.685 0.793    FALSE 1 1.000 15451
R[2]  0.622 0.061 0.501 0.623 0.737    FALSE 1 1.000 30000
R[3]  0.362 0.061 0.248 0.360 0.485    FALSE 1 1.000 15232
R[4]  0.332 0.059 0.223 0.330 0.452    FALSE 1 1.000 30000
R[5]  0.454 0.145 0.178 0.453 0.741    FALSE 1 1.000 30000
R[6]  0.686 0.149 0.382 0.694 0.940    FALSE 1 1.000 17285
R[7]  0.268 0.161 0.014 0.255 0.601    FALSE 1 1.000 30000
R[8]  0.565 0.181 0.209 0.567 0.904    FALSE 1 1.000  9748
R[9]  0.495 0.237 0.046 0.495 0.943    FALSE 1 1.000  9977
Rmean 0.493 0.110 0.263 0.495 0.710    FALSE 1 1.000 16488
sd    1.169 0.637 0.429 1.021 2.733    FALSE 1 1.006  2083

We have a sensible value for Rmean, though the CrI is pretty wide as it's based on only 4 sites with good data. We also have an estimate for site I, R[9], which is close to Rmean but with an even wider CrI.

Let's compare the results of the different methods as a table and a plot:

res <- cbind(n=n,
             y=y,
             naive = y/n,
             np = c(outnp$mean$R, NA),
             tp = rep(outtp$mean$R, 9),
             pp = outpp$mean$R)
round(res, 2)

   n  y naive   np  tp   pp
A 60 42  0.70 0.69 0.5 0.68
B 60 38  0.63 0.63 0.5 0.62
C 60 21  0.35 0.35 0.5 0.36
D 60 19  0.32 0.32 0.5 0.33
E  7  3  0.43 0.44 0.5 0.46
F  6  5  0.83 0.75 0.5 0.69
G  4  0  0.00 0.16 0.5 0.26
H  3  2  0.67 0.59 0.5 0.57
I  0  0   NaN   NA 0.5 0.49

This has done what we wanted: the sites with lots of data, A-D, are almost unaffected, but those with few data, E-H, have been pulled towards the overall mean. We've even got an estimate for the site with no data, though with wide CrI; a key problem there is the wide variability among sites: we don't know if site I is as high as A or B or even higher, or as low as C or D or even lower.

 

Updated 19 July 2019 by Mike Meredith