www.phidot.org

by **ezylstra** » Mon Jan 07, 2008 2:34 pm

I just updated to a newer version of presence. In the older version, outputs included an overall proportion of sites occupied with standard error and average detection probabilities (by survey). I don't see these estimates in presence 2.1 output. Is there a reason for that?

Additionally, when I looked at those older outputs, I'm not sure how the standard error on that overall psi was calculated. I just want to make sure that the standard error is appropriate for some of my subsequent analyses.

Thanks!

by **jhines** » Mon Jan 07, 2008 4:15 pm

Dear Erin,

The 'overall estimate of sites occupied' in the old output is the same as the individual estimate of 'psi' for the first survey/sample when there are no covariates. If you have covariates which cause psi (occupancy) to be different for each site, you could get an overall estimate by having the program print all estimates, and taking the average of all of the estimates and the average of the standard errors. I think I re-labeled the output to make it clearer, and removed the 'average' estimates since those numbers weren't meaningful for covariate models (or, I should probably say that the 'beta' estimates are the more meaningful numbers in that case).

Jim

by **ezylstra** » Mon Jan 07, 2008 4:30 pm

Jim -
I guess I'm still a little confused. I'm using a model with several covariates, but I still wanted the overall estimate of psi and its standard error. I understood that the overall estimate of psi is calculated by taking the average of the individual site estimates. However, I'm not sure where the standard error in the old output came from. It isn't the average of the the individual standard errors, and it isn't the standard error of the individual estimates of psi.
I recently used this overall estimate of psi and its standard error for a series of power analyses, but now I'm worried that the standard error from the old output may not be appropriate.
Thanks for your help!

by **darryl** » Mon Jan 07, 2008 4:50 pm

Erin,
I also suggested to Jim that the 'overall' estimate be removed because I frequently found people were misusing/misinterpreting it. It was the average probability of occupancy at the surveyed sites (and the covariate values at those sites). If you want to generalise that value to other places that haven't been surveyed then you need to start making assumptions about the observed covariate values being 'representative' of the distribution of covariate values everywhere else. This comes through having a good sampling scheme.

What, exactly, is the quantity you're after?

As for standard errors, it again depends on what quantity you're interested in, but in the old output I think it was calculated as SE = sqrt(sum(VAR[i]))/N, where VAR[i] is the variance for each site-specific estimate of psi (ie SE^2).

Cheers
Darryl

by **jlaake** » Wed Jan 09, 2008 6:59 pm

I can't speak to what is in PRESENCE but I wanted to address the question that was asked because neither of the last replies was quite correct. Darryl was close but in his message he forgot to include the covariances that occur because the predictions are all based on the same model. If you have K sites with different covariate values then you can construct a KxK variance-covariance matrix (V) for the Psi estimates. Then the standard error for the average Psi is the square root of (sum of all the elements in V divided by N^2). What Darryl wrote was the sum of all the diagonal elements of V which will be considerably less. What Jim wrote was the average of the square root of the diagonal elements and that will be too large.
There is some question about how this variance should be used in the inference and that’s why Darryl had Jim remove it. At first I thought that it would pertain solely to the visited sites but after thinking about the simpler problem where p=1 and with no covariate (binomial probability), it became clear to me that it is a variance for inference to a broader area as long as sites were sampled randomly. However, when you add a covariate it doesn’t work as well as you would like. To investigate this, I wrote a simulation for R using RMark which uses the occupancy model in MARK. The code is shown below if anyone wants to use it to explore further. What I found was that the std error with a 95% confidence interval (est +/- 1.96*std err) provides the correct nominal coverage for the population occupancy proportion (pao) with p<1 with Psi(.). However, when I included a covariate for Psi, then the observed conf interval coverage was slightly low at about 92% with 1000 reps. The reason is because the variation in the sample of covariate values is not included. One way to avoid this is to compute the average occupancy based on the covariate values for all of the sites in the population and not just the visited ones. If you do that, then you do get the correct nominal 95% coverage (use.popcov=TRUE in the simulation). As a simple example, this could be the amount of area (contained in the larger sampled region and not just the sites) with browse=0 and browse=1 in the weta example. If p0 is the proportion of area with browse=0, so 1-p0 is proportion with browse=1. Then we’ll call pi=vector containing p0,1-p0. Let Psi_0 and Psi_1 be the predictions for browse=0 and browse=1 and let V be the 2x2 variance-covariance matrix for those estimates. Then the average Psi is p0*Psi_0+(1-p0)*Psi_1. The variance is simply
p0^2*var(Psi_0)+ (1-p0)^2*var(Psi_1)+2*p0*(1-p0)*cov(Psi_0,Psi_1)
and the std error is the square root of the variance. The above formula can also be written in matrix form as
pi’ V pi
where pi’ is the transpose of pi. To relate to the formula given above if pi=1/N for each covariate prediction then you get the sum of all the elements in V divided by N^2 to get the variance. Now if your covariate is continuous you can get an approximation like in the code below where it computes a frequency in bins and computes at the interval midpoint to the proportions for pi.
Now if you don’t know the covariate value for the entire inference region, then you can also propose a distribution for the covariate value, fit it to the sample and incorporate the uncertainty in the distribution parameters. There is also probably a way to do that non-parametrically (ie histogram or kernel). This approach is fine as long as you sampled randomly. However, if your sites were selected such that these covariate values are not representative of the population (eg you preselected more browsed areas than are in the population) then you really must use the population approach described above to get a valid pao point and interval estimate.

# Simulate a random sample of sites for occupancy models
# This function requires RMark package
#
#
# Arguments:
# nreps - number of simulation replicates
# use.popcov - if TRUE it uses population covariates for prediction
# beta0 - logit intercept for Psi
# beta1 - logit slope for covariate which is N(0,1)
# p - constant detection probability
# number.sites - number of sites in the population
# number.sites.sampled - number of sites sampled
# number.visits - number of visits to each site
#
# Value:
# cic - proportion of simulations in which confidence interval included true value
# cichi - number of times true value was below interval estimate
# ciclo - number of times true value was above interval estimate
# pao - average pao of the simulations
#
# Author: Jeff Laake
# Date: 9 Jan 2008
#
do.occsim=function(nreps,use.popcov=TRUE,beta0=0,beta1=1,p=0.5,number.sites=5000,
number.sites.sampled=200,number.visits=5)
{
require(RMark)
cic=0
cichi=0
ciclo=0
cic.sample=0
avg.pao=0
for (i in 1:nreps)
{
cat("\n rep =",i)
site.var=rnorm(number.sites,0,1)
if(use.popcov)
{
f.var=hist(site.var,plot=F,breaks=100)
p.var=f.var$counts/sum(f.var$counts)
nval=length(f.var$counts)
val.var=(f.var$breaks[1:nval]+f.var$breaks[2:(nval+1)])/2
}
site.p=beta0+beta1*site.var
site.p=exp(site.p)/(1+exp(site.p))
site.occ=rbinom(number.sites,1,site.p)
sites.sampled=sample(1:number.sites,number.sites.sampled,replace=FALSE)
pao=mean(site.occ)
ch=matrix(rbinom(number.visits*number.sites.sampled,1,p),ncol=number.visits)
ch=ch*site.occ[sites.sampled]
ch=apply(ch,1,paste,collapse="")
occ.data=data.frame(ch=ch,site.var=site.var[sites.sampled])
occ.data$ch=as.character(ch)
model=mark(occ.data,model="Occupancy",model.parameters=list(Psi=list(formula=~site.var),p=list(formula=~1)),output=F)
if(use.popcov)
{
cc=covariate.predictions(model,data.frame(site.var=val.var),indices=6)
stderr=sqrt(sum(outer(p.var,p.var)*cc$vcv))
pao.est=sum(cc$estimates$estimate*p.var)
}
else
{
cc=covariate.predictions(model,data.frame(site.var=site.var[sites.sampled]),indices=6)
stderr=sqrt(sum(cc$vcv))/number.sites.sampled
pao.est=mean(cc$estimates$estimate)
}
avg.pao=avg.pao+pao.est/nreps
if(pao >= pao.est-1.96*stderr & pao <= pao.est +1.96*stderr)
{
cic=cic+1
}
else
{
if(pao < pao.est-1.96*stderr)
{
cichi=cichi+1
cat("\n cichi ",cichi)
}
else
{
ciclo=ciclo+1
cat("\n ciclo ",ciclo)
}
}
cat("\n cic =",cic/i)
}
return(list(cic=cic/nreps,cichi=cichi,ciclo=ciclo,pao=avg.pao))
}
cic=do.occsim(1000)

by **jlaake** » Mon Jan 14, 2008 7:14 pm

After my last posting I wondered what variance you would use if you were only interested in making inference about the sample or you sampled the entire population. After reviewing sections 4.2 (pg 86-87) and 4.4 (pg 96) in the book, the variance issue became a little clearer in my mind with respect to a variance for the population occupancy versus the sample occupancy. The two components of eq 4.4 are explained on the top of pg 87 but maybe the following will help further. What we want is the variance of the number of detected occupied sites s_d. Let s_o to be the true number of occupied sites in a sample of size s. s_d varies because the number of occupied sites s_o varies (the first term) and s_d varies due to the vagary of detection even with a fixed value of s_o (the second term). The variance of s_o across samples of size s, is s*Psi*(1-Psi). If you plug in the variance of s_o and the expected value of s_d = s_o p* (multiplication * is left out to avoid confusion) into 4.3 then you get the first term Psi (1-Psi)/s. For a given value of s_o, s_d is binomial with size s_o and probability p*, so the variance is s_o p*(1-p*). Now, if you plug that into 4.3 and take the expectation of s_o (E(s_o)=s Psi) then you get s Psi p* (1-p*) /s^2p*^2 = Psi (1-p*)/sp* which is the second term. This is a standard decomposition which is the variance of an expectation plus the expectation of a variance to get a total variance.
So why does that matter? Well, if you are only interested in making an inference about the sample in hand or the sample is the population (e.g., sampled all the ponds in a county for a turtle species), then you do not want to include the first term of the variance. It helps if you can imagine an extreme example. What if p*=1? Notice that the second term will be 0 if p*=1 and we would know whether each site was occupied or not in our sample. Then if we are only interested in that sample or we sampled the entire population there is no uncertainty about the proportion of sites occupied. This may seem to directly contradict the first sentence of the second paragraph on pg 87. However, it does not because that statement presumes that the inference is from a sample to a population and that the sample is a very small fraction of the population. If you sample the entire population then there is no variation in Psi across samples, so the variation can go to zero if p*=1. Note that one could propose an intermediate result in which the first term had a finite population correction factor if you sampled a significant fraction (but not all) of the population.
Anyhow, below I’ve included a modified version of the code I posted last time. Now the code includes both the confidence interval coverage for the estimate of the population occupancy and for the realized occupancy in the sample. The code could be modified to look at the impact of an fpc in a sampling design.
Code:
# Simulate a random sample of sites for occupancy models
# This function requires RMark package
#
#
# Arguments:
# nreps - number of simulation replicates
# use.popcov - if TRUE it uses population covariates for prediction
# beta0 - logit intercept for Psi
# beta1 - logit slope for covariate which is N(0,1)
# p - constant detection probability
# number.sites - number of sites in the population
# number.sites.sampled - number of sites sampled
# number.visits - number of visits to each site
#
# Value:
# cic - proportion of simulations in which confidence interval included true pop’n occupancy value
# cic.sample - proportion of simulations in which confidence interval included true
# sample occupancy value
# stderr.real – vector of nreps; standard error for sample occupancy estimate using eq 4.8
# stderr.delta - vector of nreps; standard error for sample occupancy estimate using delta
# method variance for p*
# stderr.exp – vector of nreps; standard error for popn occupancy estimate
#
# Author: Jeff Laake
# Date: 14 Jan 2008
# Now also gives confidence interval coverage for realized occupancy in the sample
# to demonstrate std error for realized occupancy in which the inference is limited
# solely to the sampled sites which includes the case in which all sites in the
# population are sampled (eg. occupancy of a turtle sp in all ponds within a county)
#
do.occsim=function(nreps,use.popcov=TRUE,beta0=0,beta1=1,p=0.5,number.sites=5000,
number.sites.sampled=200,number.visits=5)
{
require(RMark)
# set up arrays and change number.visits to K to match book
cic=0
cic.sample=0
stderr.vec.real=rep(0,nreps)
stderr.vec.delta=rep(0,nreps)
stderr.vec.exp=rep(0,nreps)
K=number.visits
# loop over the number of replicate simulations requested
for (i in 1:nreps)
{
cat("\n rep =",i)
# generate occupancy covariate for the population of sites (Normal mean =0, sd=1)
site.var=rnorm(number.sites,0,1)
# if the variance is to be calculated for the population then construct a
# a histogram density for the realized covariates in the population for this replicate
if(use.popcov)
{
f.var=hist(site.var,plot=F,breaks=100)
p.var=f.var$counts/sum(f.var$counts)
nval=length(f.var$counts)
val.var=(f.var$breaks[1:nval]+f.var$breaks[2:(nval+1)])/2
}
# compute the probability that the site is occupied given it's covariate value
site.p=beta0+beta1*site.var
site.p=exp(site.p)/(1+exp(site.p))
# draw from the binomial (bernoulli) distribution a 0/1 for each site with the
# site-specific occupancy probability
site.occ=rbinom(number.sites,1,site.p)
# select a random set of sites without replacement
sites.sampled=sample(1:number.sites,number.sites.sampled,replace=FALSE)
# compute the mean occupancy for the population
pao=mean(site.occ)
# compute the mean occupancy for the selected sites
sample.pao=mean(site.occ[sites.sampled])
# compute an observed detection history for each site regardless of occupancy
ch=matrix(rbinom(number.visits*number.sites.sampled,1,p),ncol=number.visits)
# multiply by occupancy site such that unoccupied sites all end up as 000...0 ch
ch=ch*site.occ[sites.sampled]
ch=apply(ch,1,paste,collapse="")
# create a dataframe for RMark
occ.data=data.frame(ch=ch,site.var=site.var[sites.sampled])
occ.data$ch=as.character(ch)
# fit the appropriate model depending on value of beta1
if(beta1==0)
model=mark(occ.data,model="Occupancy",model.parameters=list(Psi=list(formula=~1),p=list(formula=~1)),output=F,retry=1)
else
model=mark(occ.data,model="Occupancy",model.parameters=list(Psi=list(formula=~site.var),p=list(formula=~1)),output=F,retry=1)
# extract values and compute std errors
p.est=summary(model)$real$p[1]
pstar=1-(1-p.est)^K
s=number.sites.sampled
su=length(ch[ch==paste(rep("0",K),collapse="")])
sd=s-su
Psi=sd/(pstar*s)
varp=summary(model,se=T)$reals$p$se[1]^2
# delta method std error; assumes constant p
stderr.vec.delta[i]=sqrt(Psi*(1-pstar)/(s*pstar)+(sd/s)^2*varp*(K*p.est^(K-1))^2/pstar^4)
# variance from book; assumes constant p
stderr=sqrt(sd*(1-pstar)/(s*pstar)^2+(Psi*(1-pstar)^2*K*p.est)/(s*pstar*(pstar*(1-p.est)-K*p.est*(1-pstar))))
stderr.vec.real[i]=stderr
# record conf interval coverage for the realized occupancy of the sample
if(sample.pao >= Psi-1.96*stderr & sample.pao <= Psi + 1.96*stderr)
cic.sample=cic.sample+1
cat("\n cic.sample =",cic.sample/i)
# compute variance for estimate of Psi for the population using either
# population values of covariate (use.popcov=TRUE) or the sample values;
# the latter will underestimate the variance as currently coded.
if(use.popcov)
{
cc=covariate.predictions(model,data.frame(site.var=val.var),indices=6)
stderr=sqrt(sum(outer(p.var,p.var)*cc$vcv))
pao.est=sum(cc$estimates$estimate*p.var)
}
else
{
cc=covariate.predictions(model,data.frame(site.var=site.var[sites.sampled]),indices=6)
stderr=sqrt(sum(cc$vcv))/number.sites.sampled
pao.est=mean(cc$estimates$estimate) # could also use estimate of Psi above
}
stderr.vec.exp[i]=stderr
# compute confidence interval coverage for estimate of population occupancy
if(pao >= pao.est-1.96*stderr & pao <= pao.est +1.96*stderr)
cic=cic+1
cat("\n cic.population =",cic/i)
}
return(list(cic=cic/nreps,cic.sample=cic.sample/nreps,stderr.real=stderr.vec.real,stderr.delta=stderr.vec.delta,stderr.exp=stderr.vec.exp))
}
cic=do.occsim(1000,number.sites.sampled=200)

www.phidot.org

overall psi estimate

overall psi estimate

estimate of overall occupancy

Variance for average Psi

More on variance of Psi

Who is online