## Royle-Nichols models for occupancy |
|||||||||||||||

HOME |
In occupancy modelling we are interested in the probability that a species is
present at a location, taking account of the possibility that it may be there
but not be detected. So we need to jointly estimate probability of occupancy and
probability of detection. In many studies we would expect probability of
occupancy to be affected by the number of individuals at the location, so it
would make sense to include that as a covariate in the analysis. But of course
we don't know how many are present - if we did we would not be trying to
estimate occupancy. The Royle-Nichols model (Royle & Nichols 2003) treats the
number available for detection at a site as a latent variable which can be
estimated from detection/non-detection data. In the simplest model, the number available for detection at site \( i \), \( N_i \), is modelled as being drawn from a Poisson distribution with parameter \( \lambda \). This is the biological model. The observation model assumes that detection of each individual is a Bernoulli trial, individuals being detected with probability \( r \). The data do not show which individuals are detected, just whether the species was detected, and that is recorded if at least one individual is observed. To get from the probability of detecting one individual, \( r \), to the probability of detecting the species, \( p \), think of the probability of non-detection: The probability of not detecting an individual is \( 1 - r \) and if \( N_i \) individuals are present, the probability of detecting none of them is \( (1 - r)^{N_i} \). Detecting "at least one" is equivalent to detecting "not none", and the probability of that is \( p = 1 - (1 - r)^{N_i} \). The model parameters to be estimated are \( \lambda \) and \( r \). The probability of occupancy, \( \psi \), is derived from \( \lambda \): \[ \psi = {\rm I\!P}(N_i > 0) = 1 - e^{-\lambda} \] As usual, we can model \( \lambda_i \) as a function of site covariates, and \( r_{it} \) as a function of site and survey covariates, but we assume that detection probability is the same for all individuals at site \( i \) and time \( t \) and detections are independent. Maximum likelihood estimation for the Royle-Nichols occupancy
model is available with the ## Implementation in JAGS## Simulating dataLet's begin by simulating a data set to work with. That process itself will clarify how the model works.
## Maximum likelihood estimation
We can now fit the Royle-Nichols model with the
The parameter estimates \( \lambda \) and \( \psi \) are a bit low for this realisation. We can compare with the standard psi(.)p(.) model:
The AIC value for this model is a lot higher, indicating that the Royle-Nichols model would give better predictions than the standard model. ## The JAGS modelHere is the JAGS model I used. Note that in this model
For this trial run I used a Uniform(0, 10) prior and will check to see if the upper limit constrains the posterior. It would be better to use a gamma prior with sensible parameters for a run with real data that you wished to report. This is saved in the file named "modelRN.txt" in the current working directory. ## Running the model from R
Here are the first few rows of the output:
For this data set, the estimate of \( \lambda \) and \( \psi \) are a bit low, as we saw for the ML estimates. I'd like to compare the estimates of \( N \) with the true values for each of the 200 sites, but plotting \( \hat{N} \) versus \( N \) (graph not shown) is not very informative because integers are involved. Instead, we'll tabulate the mean estimate \( \hat{N} \) for each of the true values of \( N \):
There are only 2 sites with >4 individuals, so we shouldn't take too much notice of those. The estimate for \( N = 0 \) is inevitably biased high (as any value >0 in the MCMC chain will immediately produce an estimate >0), those for \( N > 1 \) appear to be biased low, at least for this simulated sample. ## Investigating accuracy## Probability of occupancyI ran the simulation 1000 times (with different seeds) and estimated occupancy using both the RN model and the null model (the psi(.)p(.) model). Comparing the estimates with the data-generating value, \( \psi = 1 - e^{-\lambda} = 0.777 \), the null model showed a bias of -0.077 and the RN model was almost unbiased (0.006). Root mean squared errors (RMSEs) calculated using the data-generating value were 0.087 for the null model and 0.053 for the RN model. Some of the error here is due to sampling error, the fact
that the proportion of occupied sites in the sample is not the
same as the data-generating \( \psi \). Comparing estimates with
the proportion in sample, While doing this, I used the I also ran the ML analyses, using the occSS0 and occSSrn0 functions in wiqid. These gave very similar results, as one would expect as we are running the Bayesian analysis with flat priors. AIC for the RN model was lower than the null model for 86% of the samples. Now the acid test: how does the RN model perform when
probability of detection does Based on these simulations, choosing the correct model gives more accurate results and lower bias. AIC picks the right model >80% of the time, but not always. LOO and WAIC were almost always lower for the RN model, even when the data came from a null model. ## Numbers at each siteFor the same 1000 simulations of the RN model, I checked the estimates of the number of individuals available for detection at each site as I did above. All 1000 had sites with values of \( N \) between 0 and 4, and 931 simulations had sites with \( N =5 \). Higher values were uncommon, so are not included here. The mean estimates across all the simulations are in the table below:
For this particular scenario, it seems that the estimates are more or less unbiased for \( N = 2 \), but are positively biased for \( N < 2 \) and negatively biased for \( N > 2 \). The results would certainly be different with other values for \( \lambda \) and \( r \) and for different numbers of replicates. But the take home message would likely be similar: (a) don't regard the output as unbiased estimates of abundance, or even "number of animals available for detection", at each location; but (b) the estimates may still be useful to investigate trends in abundance with habitat covariates. | ||||||||||||||

Updated 3 Sept 2018 by Mike Meredith |