## Interpreting non-detection when observations are imperfect

How long should you search before deciding that your target is absent? This is a problem that is fundamental to any ecological study that aims to determine the absence of a species. It will be important in studies of extinction, and when determining community composition. It also has applied relevance – how much search effort should be required in environmental impact statements, or when looking for an invasive species?

The possibility of imperfect detectability (a topic I have covered in a few previous posts) means that a species might not be recorded in a survey even when it is present.

Methods to estimate the probability of detection for a given level of search effort have been available for some time. Kirsten Parris (1999) was one of the first people I know to start thinking about this, which she pursued during her PhD in a pilot study of the effectiveness of different survey methods for frogs. It turned out that the question of imperfect surveys had been considered a couple of years earlier. McArdle (1990) asked, given a level of survey effort, what is the probability that a species will not be detected when present.

But Brendan Wintle and I started thinking, the probability of failed detection is not the same as determining the probability that a species is present given it is not detected. To illustrate, think of rolling a fair six-sided die. The probability of rolling a one is 1/6. But the probability that I am using a fair six-sided die given I roll a one is not 1/6. The latter probability depends on other information, such as how much you trust me. The same idea applies to ecological surveys. We might be able to calculate the probability of non-detection given presence, but the probability of presence given non-detection is different. The latter, which seems more relevant in many cases, is the inverse of the former, but the two are often confused (see David Pannell’s blog post about this).

Prior information is important when determining the probability of presence give non-detection. For example, we wouldn’t require any survey effort to be almost certain that dodos are not roaming wild in Melbourne; a cursory glance would be more than sufficient. However, a night of survey effort that failed to find any common brushtail possums, which are well known in Melbourne, would not be compelling evidence that they are absent from the city. The reason for this different attitude to evidence is that other information exists to modify our belief about whether the species are present in Melbourne.

In a Bayesian context, this extra information enters via the prior probability of presence. In the case of the dodo, the prior probability of presence is so close to zero that no survey effort is required to support a belief in the species’ absence. In the case of the common brushtail possum, a single night of survey without seeing the species might simply be viewed as an aberration.

A few years ago, Brendan and I worked out the amount of survey effort that is required to be sufficiently sure that a species is absent. When we presented this solution at the Bayesian workshops for ecologists that we run, we would always say “We really should publish that.” Well, with the help of Kirsten Parris and Terry Walshe, now we have (Wintle et al. 2012).

Terry describes the idea and explores the inverse probability fallacy in an article in Decision Point, the monthly magazine of the Environmental Decisions Group. Check it out. Here I give an overview of some of the technical details, and derive an intuitive interpretation of our result.

If we failed to find a species in a survey that has a particular reliability, we can determine the posterior probability of presence given non-detection, Pr(p | 0), using Bayes’ rule (assuming no false presences, only false absences):

Pr(p | 0) = Pr(p)×Pr(0 | p) / [Pr(p)×Pr(0 | p) + 1 – Pr(p)].

Here, Pr(p) is the prior probability of presence, and Pr(0 | p) is the probability of failing to detect the species with the given survey effort when the species is in fact present. The prior probability of presences could be a subjective probability derived from expert elicitation or a species distribution model fitted to independent data.

One way to increase the reliability of a survey (i.e., decrease Pr(0 | p)) is to conduct more surveys. If d is the per visit detection probability, then Pr(0 | p) after n consecutive independent surveys is equal to (1 – d)n. Substituting this expression for Pr(0 | p) into the above equation yields:

Pr(p | 0) = Pr(p) × (1 – d)n / [Pr(p) × (1 – d)n + 1 – Pr(p)].

Now, let’s assume that we are trying to prove the absence of a species. We can then determine how large must n be to ensure that the posterior probability of presence is less than a value that is deemed acceptably small if the species is not detected. Let’s call this acceptable posterior probability of presence “A”. We now want to do enough surveys such that if the species is not detected, the posterior probability of presence is less than A.

If Pr(p | 0) is required to be less than A, then re-arranging the above formula leads to

n > {ln[A/(1–A)] – ln[Pr(p) /(1–Pr(p))]} / ln(1 – d).

This might look a little daunting, but in fact it is quite intuitive. The first term in the numerator (ln[A/(1–A)]) is the logit of the required posterior probability of presence, and the second term (ln[Pr(p) /(1–Pr(p))]) is the logit of the prior probability of presence. So the numerator is simply the difference between the prior probability of presence and the required posterior probability of presence (but measured on the logit scale). This difference measures the weight of evidence that is required from the surveys.

And the denominator (ln(1 – d)) is simply equal to –R, where R is the rate of detection per survey under a case where detections occur randomly (i.e., R is the expected number of detections of the species during a survey; this can be derived from a Poisson process in which the probability of at least one detection is given by d = 1 – exp(–R)).

So substituting these into the above, we have:

n > W / R,

where W is the required reduction in the logit of the probability of presence following n surveys (ln[Pr(p) /(1–Pr(p))] – ln[A/(1–A)]) and R is the expected number of detections in a single survey.

How simple is that? The required number of surveys increases with the weight of evidence required (expressed as W) and declines with the quality of each survey (expressed as R).

The consequence is that both the probability of detection and the prior probability of presence can have a large influence on the required number of surveys (Fig. 1). Notably, the required number of surveys can be very large when the probability of detection is small and when the prior probability of presence is high. But it is not something that is easy to determine intuitively; our equation will be required by most people. Fig. 1. Observation effort required to be 95% sure that a species is absent from a particular site (A = 0.05) versus the prior (before data) belief that the species occupies the site. Three lines correspond to three different single-visit detection probabilities (d = 0.1, 0.3 and 0.5). The prior belief in occupancy could be a subjective probability derived from expert elicitation or a species distribution model fitted to independent data (from Wintle et al. 2012).

References

McArdle, B.H. (1990). When are rare species not there? Oikos 57: 276-277.

Parris, K.M., Norton, T.W., and Cunningham, R.B. (1999). A comparison of techniques for sampling amphibians in the forests of south-east Queensland, Australia. Herpetologica 55: 271-283.

Wintle, B.A. Walshe, T.V., Parris, K.M., and McCarthy, M.A. (2012). Designing occupancy surveys and interpreting non-detection when observations are imperfect. Diversity and Distributions 18: 417-424. 