Categorical covariates in JAGS

HOME I've recently been asked about coding categorical covariates in JAGS. You can do it in the usual frequentist manner with dummy variables for all but one of the variables, but JAGS allows other, more easily interpreted ways of coding, especially if there is only one categorical covariate in the model and it is a logistic or log regression. Many of our wildlife analyses have logistic models under the hood.

In R, the class factor is used for categorical variables. A factor has a levels attribute with the names of the categories, in alphabetical order by default, and the values are stored as integers giving the level number.

A toy example

I'll generate some simulated data for a logistic regression model (as that generalises to a lot of our other models). We have four players with different degrees of skill and a continuous covariate which does nothing.

playerNames <- c("Andy", "Bob", "Chan", "Dora")
skill <- c(0.3, 0.6, 0.7, 0.8)
set.seed(2017)
player.char <- sample(playerNames, size=200, replace=TRUE, prob=c(1,3,2,2))
player <- factor(player.char)
summary(player)
Andy  Bob Chan Dora 
  25   84   52   39 
success <- rbinom(200, 1, p=skill[as.integer(player)])
fluff <- rnorm(200)
tapply(success, player, mean) # Check: should be close to 'skill'
     Andy       Bob      Chan      Dora 
0.2800000 0.5833333 0.7115385 0.8461538 

I've used character strings for the player names and then for the levels of the player factor (they appear in the summary), which I think is much safer than using numbers for categories. A factor can easily be converted to integers if necessary, as in the call to rbinom above. Note that the number of observations per category varies, something that happens often with real data.

Frequentist analysis with glm

We'll quickly run a glm analysis for the full model:

glm(success ~ player + fluff, family='binomial')

Call:  glm(formula = success ~ player + fluff, family = "binomial")

Coefficients:
(Intercept)      playerBob     playerChan     playerDora        fluff  
    -0.9589         1.2927         1.8726         2.6850       0.1198  

Degrees of Freedom: 199 Total (i.e. Null);  195 Residual
Null Deviance:      263.6 
Residual Deviance: 239.2        AIC: 249.2

Notice that we have coefficients for Bob, Chan and Dora, but not for Andy. The intercept refers to Andy's plays and the other coefficients give the difference between Andy and each of the other players. It isn't possible to have an intercept and a coefficient for each player, as they would be confounded: you could add any value you liked to the intercept provided you subtracted the same value from the player coefficients and you'd get exactly the same result.

The null model - no covariates

To begin with, let's run a JAGS model with no covariates. In this case we need to estimate a single probability of success, p, and do not need the logit link.

modelText <- "
model {
  for(i in 1:length(success)) {
    success[i] ~ dbern(p)
  }

  # Priors
  p ~ dbeta(1,1)
}"
writeLines(modelText, con="modelNull.txt")

Here I'm using a Beta prior, which is the appropriate distribution for a probability, as it is restricted to the range of 0 to 1. Beta(1,1) is a uniform prior with little information, and no one is likely to question its choice, except to suggest a more informative prior. And a more informative prior could easily be coded with, say, p ~ dbeta(2,3). Many people use Uniform(0,1) for a uniform prior instead of Beta(1,1); that's mathematically equivalent, but is not recognised as a probability by JAGS.

JAGSdata <- list(success=success)
wanted <- "p"

library(jagsUI)
outNull <- jags(JAGSdata, NULL, wanted, "modelNull.txt",
  n.chains=3, n.adapt=100, n.iter=2000)
outNull
...
            mean    sd    2.5%     50%   97.5% overlap0 f  Rhat n.eff
p          0.629 0.033   0.563   0.629   0.692    FALSE 1 1.000  3668
...
plot(outNull)

This works fine and the plots (not shown) indicate good mixing and convergence.

One categorical covariate

If we have only one covariate and that covariate is categorical, the simplest strategy is to estimate a value for p for each category. We do not use the logit link, and dealing directly with p simplifies interpretation and discussion of priors.

modelText <- "
model {
  for(i in 1:length(success)) {
    success[i] ~ dbern(p[player[i]])
  }

  # Priors
  for(i in 1:4) {
    p[i] ~ dbeta(1,1)
  }
}"
writeLines(modelText, con="modelPlayer.txt")

Again I'm using a Beta(1,1) prior, which could easily be replaced by an informative beta prior, a different one for each player if you wanted.

JAGSdata <- list(success=success, player=as.integer(player))
str(JAGSdata)
List of 2
 $ success: int [1:200] 0 1 0 0 1 1 0 1 1 1 ...
 $ player : int [1:200] 1 3 3 2 4 4 2 3 3 2 ...
wanted <- "p"

The factor player was converted to integers for JAGS, with Andy = 1, Bob = 2, etc. This is used as the index into the p vector so that the right p is used for each observation in the success vector.

outPlayer <- jags(JAGSdata, NULL, wanted, "modelPlayer.txt",
  n.chains=3, n.adapt=100, n.iter=2000)
outPlayer
...
            mean    sd    2.5%     50%   97.5% overlap0 f  Rhat n.eff
p[1]       0.299 0.087   0.144   0.294   0.487    FALSE 1 1.001  1627
p[2]       0.582 0.053   0.479   0.582   0.684    FALSE 1 1.000  3484
p[3]       0.705 0.061   0.578   0.708   0.817    FALSE 1 1.000  6000
p[4]       0.830 0.058   0.706   0.835   0.927    FALSE 1 1.001  2855
...
# Compare with the proportion of successes for each player:
tapply(success, player, mean)
     Andy       Bob      Chan      Dora 
0.2800000 0.5833333 0.7115385 0.8461538 

The posterior means are close to the actual proportion of successes in the data, though pulled somewhat towards 0.5 as we would expect - with a uniform prior, the posterior mode would be equal to the observed proportion of successes; the posterior distribution is skewed, so the mean is closer to 0.5.

More than one covariate

With more than one covariate we do need to use the logit link. There are several strategies, but with only one categorical covariate, we can keep it simple by using a different intercept for each category.

The intercept for each player is the logit(probability of success) when fluff = 0, which is actually mean of fluff. So we might be able to provide a sensible prior for probability of success. In the code below, I've specified a uniform prior, Beta(1,1), for the intercepts on the probability scale, then converted to logit for use in the regression. You could use an informative Beta prior for each player if you wished.

One intercept per category

modelText <- "
model {
  for(i in 1:length(success)) {
    logit(p[i]) <- bPlayer[player[i]] + bFluff * fluff[i]
    success[i] ~ dbern(p[i])
  }

  # Priors
  for(i in 1:4) {
    pPlayer[i] ~ dbeta(1,1)         # prior specified on probability scale...
    bPlayer[i] <- logit(pPlayer[i]) # ... then converted to logit scale.
  }
  bFluff ~ dunif(-5,5)
}"
writeLines(modelText, con="model1.txt")

Here I'm using my favourite uniform priors, dunif(-5,5), for logistic regression coefficients when continuous covariates are standardised (fluff was generated with mean = 0 and SD = 1). These may not be appropriate for your reported analysis, but it is easy to see if the prior is constraining the posterior.

JAGSdata <- list(success=success, player=as.integer(player), fluff=fluff)
str(JAGSdata)
List of 3
 $ success: int [1:200] 0 1 0 0 1 1 0 1 1 1 ...
 $ player : int [1:200] 1 3 3 2 4 4 2 3 3 2 ...
 $ fluff  : num [1:200] 0.48 0.688 -0.201 0.101 -0.577 ...
wanted <- c("bPlayer", "bFluff")
out1 <- jags(JAGSdata, NULL, wanted, "model1.txt",
  n.chains=3, n.adapt=100, n.iter=2000)
out1
...
              mean    sd    2.5%     50%   97.5% overlap0     f  Rhat n.eff
bPlayer[1]  -1.013 0.448  -1.924  -0.998  -0.157    FALSE 0.989 1.002  5123
bPlayer[2]   0.337 0.221  -0.095   0.338   0.764     TRUE 0.936 1.002  1195
bPlayer[3]   0.943 0.316   0.345   0.931   1.588    FALSE 0.999 1.000  6000
bPlayer[4]   1.835 0.466   1.011   1.811   2.827    FALSE 1.000 1.000  6000
bFluff       0.123 0.166  -0.196   0.122   0.448     TRUE 0.771 1.000  6000
...

The bPlayer coefficients are on the logit scale, so we need to convert to probabilities (for the cases where fluff = 0) for comparison with the known probabilities:

logitP1 <- out1$sims.list$bPlayer
P1 <- plogis(logitP1)
colMeans(P1)
[1] 0.275 0.582 0.715 0.853

One category as reference category

The method above won't work if you have more than one categorical covariate. An alternative is to use the strategy adopted by default by glm: the model has an intercept which refers to one of the categories, and coefficients for the difference between this reference category and each of the others. We adapt our code by adding the intercept, b0, and setting the coefficient for the reference category to zero.

modelText <- "
model {
  for(i in 1:length(success)) {
    logit(p[i]) <- b0 + bPlayer[player[i]] + bFluff * fluff[i]
    success[i] ~ dbern(p[i])
  }

  # Priors
  b0 ~ dunif(-5,5)
  bPlayer[1] <- 0       # First level is reference category
  for(i in 2:4) {
    bPlayer[i] ~ dunif(-5,5) # Difference between each player and the reference player
  }
  bFluff ~ dunif(-5,5)
}"
writeLines(modelText, con="model2.txt")

Here we might say that b0 is the intercept for the reference player, so we can provide a sensible prior for that, eg, p0 ~ dbeta(1,1) ; b0 <- logit(p0). But I have no intuitive feel for how you would choose priors for the differences (on the logit scale) among players.

The data list, JAGSdata, is the same and we add b0 to the parameters in wanted. The output looks like this:

              mean    sd    2.5%     50%   97.5% overlap0     f  Rhat n.eff
b0          -0.974 0.444  -1.891  -0.970  -0.117    FALSE 0.988 1.018   147
bPlayer[1]   0.000 0.000   0.000   0.000   0.000    FALSE 1.000    NA     1
bPlayer[2]   1.312 0.495   0.366   1.309   2.294    FALSE 0.996 1.014   172
bPlayer[3]   1.916 0.540   0.881   1.904   3.017    FALSE 1.000 1.011   255
bPlayer[4]   2.773 0.634   1.563   2.749   4.068    FALSE 1.000 1.008   276
bFluff       0.124 0.165  -0.196   0.123   0.447     TRUE 0.773 1.001  2184

Notice that the effective sample sizes, n.eff, are very low - we ran 3 chains of 2000 iterations = 6000 total - and the trace plots suggest poor mixing. The main reason for this is that the reference category, Andy, has relatively few observations, only 25 out of 200. That means that it's difficult to estimate the intercept and hence difficult to estimate the other coefficients.

We can improve things by making Bob, who has the most observations, the reference category. That could be done by changing the JAGS code (set bPlay[2] <- 0) but it's easier to change the data by releveling the player factor so that Bob comes first:

player_Bob <- relevel(player, ref="Bob")
summary(player_Bob)
Bob Andy Chan Dora 
 84   25   52   39 
JAGSdata <- list(success=success, player=as.integer(player_Bob), fluff=fluff)

We run the model with the new data and get:

            mean    sd    2.5%     50%   97.5% overlap0     f  Rhat n.eff
b0         0.337 0.222  -0.078   0.329   0.783     TRUE 0.941 1.000  4153
bPlay[1]   0.000 0.000   0.000   0.000   0.000    FALSE 1.000    NA     1
bPlay[2]  -1.349 0.517  -2.431  -1.332  -0.387    FALSE 0.998 1.000  6000
bPlay[3]   0.599 0.383  -0.151   0.593   1.364     TRUE 0.942 1.000  4634
bPlay[4]   1.467 0.503   0.510   1.453   2.502    FALSE 0.999 1.001  2687
bFluff     0.124 0.166  -0.199   0.123   0.450     TRUE 0.770 1.001  2864

Effective sample sizes are much better and the diagnostic plots now look good.

To get the estimated skill levels (with fluff = 0) we need to add the intercept, b0, to the bPlayer coefficients:

logitP2 <- out2$sims.list$bPlayer + out2$sims.list$b0
P2 <- plogis(logitP2)
colMeans(P2)
[1] 0.581 0.276 0.714 0.850

... remembering that Bob is now first in the list and Andy is second.

Categorical coefficients sum to 0

Choosing a reference category can make interpretation of the output difficult, especially if there are several categorical covariates, each with its own reference category. An alternative is to have a (non-zero) coefficient for each category but to ensure that they add up to 0 - or, if you prefer, the mean is 0. Here's the JAGS model:

modelText <- "
model {
  for(i in 1:length(success)) {
    logit(p[i]) <- b0 + bPlayer[player[i]] + bFluff * fluff[i]
    success[i] ~ dbern(p[i])
  }

  # Priors
  b0 ~ dunif(-5,5)
  for(i in 1:4) {
    temp[i] ~ dnorm(0, 0.01)
  }
  bPlayer <- temp - mean(temp)
  bFluff ~ dunif(-5,5)
}"
writeLines(modelText, con="model3.txt")

The bPlayer coefficients are generated in two steps: first we get a set of temp values drawn from broad normal distributions, then we subtract the mean from each. This time I'm not using my dunif priors for temp, as they will in fact be bumping up against the limits. The normal prior with mean 0 has the effect of keeping them within a reasonable range.

The details of running the model are the same, and here is the output:

              mean    sd    2.5%     50%   97.5% overlap0     f  Rhat n.eff
b0           0.531 0.187   0.174   0.528   0.912    FALSE 0.998 1.000  6000
bPlayer[1]  -1.506 0.377  -2.279  -1.498  -0.803    FALSE 1.000 1.001  3560
bPlayer[2]  -0.189 0.244  -0.674  -0.190   0.291     TRUE 0.787 1.002  2120
bPlayer[3]   0.406 0.296  -0.164   0.401   0.986     TRUE 0.919 1.000  6000
bPlayer[4]   1.289 0.381   0.597   1.277   2.102    FALSE 1.000 1.001  1909
bFluff       0.124 0.170  -0.207   0.122   0.453     TRUE 0.768 1.000  6000

We can check that the bPlayer coefficients sum to 0 or very near to zero:

tst <- rowSums(out3$sims.list$bPlayer)
range(tst)
[1] -1.065814e-14  1.065814e-14

This time the individual bPlayer coefficients tell us the difference in skill for each player from the mean of all players (on the logit scale). The mean with fluff = 0 is given by the intercept. So to recover estimates of skill we need to add the intercept:

logitP3 <- out3$sims.list$bPlayer + out3$sims.list$b0
P3 <- plogis(logitP3)
colMeans(P3)
[1] 0.283 0.584 0.714 0.852

Notice that we are using the "mean of all players" here; we do not take into account that the less-skilled Andy only makes a small number of attempts. It would be possible to use a weighted mean, with weights proportional to the number of attempts, if that was needed.

Updated 1 February 2018 (better object names and typos fixed) by Mike Meredith