by jlaake » Mon Jan 14, 2008 7:14 pm
After my last posting I wondered what variance you would use if you were only interested in making inference about the sample or you sampled the entire population. After reviewing sections 4.2 (pg 86-87) and 4.4 (pg 96) in the book, the variance issue became a little clearer in my mind with respect to a variance for the population occupancy versus the sample occupancy. The two components of eq 4.4 are explained on the top of pg 87 but maybe the following will help further. What we want is the variance of the number of detected occupied sites s_d. Let s_o to be the true number of occupied sites in a sample of size s. s_d varies because the number of occupied sites s_o varies (the first term) and s_d varies due to the vagary of detection even with a fixed value of s_o (the second term). The variance of s_o across samples of size s, is s*Psi*(1-Psi). If you plug in the variance of s_o and the expected value of s_d = s_o p* (multiplication * is left out to avoid confusion) into 4.3 then you get the first term Psi (1-Psi)/s. For a given value of s_o, s_d is binomial with size s_o and probability p*, so the variance is s_o p*(1-p*). Now, if you plug that into 4.3 and take the expectation of s_o (E(s_o)=s Psi) then you get s Psi p* (1-p*) /s^2p*^2 = Psi (1-p*)/sp* which is the second term. This is a standard decomposition which is the variance of an expectation plus the expectation of a variance to get a total variance.
So why does that matter? Well, if you are only interested in making an inference about the sample in hand or the sample is the population (e.g., sampled all the ponds in a county for a turtle species), then you do not want to include the first term of the variance. It helps if you can imagine an extreme example. What if p*=1? Notice that the second term will be 0 if p*=1 and we would know whether each site was occupied or not in our sample. Then if we are only interested in that sample or we sampled the entire population there is no uncertainty about the proportion of sites occupied. This may seem to directly contradict the first sentence of the second paragraph on pg 87. However, it does not because that statement presumes that the inference is from a sample to a population and that the sample is a very small fraction of the population. If you sample the entire population then there is no variation in Psi across samples, so the variation can go to zero if p*=1. Note that one could propose an intermediate result in which the first term had a finite population correction factor if you sampled a significant fraction (but not all) of the population.
Anyhow, below I’ve included a modified version of the code I posted last time. Now the code includes both the confidence interval coverage for the estimate of the population occupancy and for the realized occupancy in the sample. The code could be modified to look at the impact of an fpc in a sampling design.
Code:
# Simulate a random sample of sites for occupancy models
# This function requires RMark package
#
#
# Arguments:
# nreps - number of simulation replicates
# use.popcov - if TRUE it uses population covariates for prediction
# beta0 - logit intercept for Psi
# beta1 - logit slope for covariate which is N(0,1)
# p - constant detection probability
# number.sites - number of sites in the population
# number.sites.sampled - number of sites sampled
# number.visits - number of visits to each site
#
# Value:
# cic - proportion of simulations in which confidence interval included true pop’n occupancy value
# cic.sample - proportion of simulations in which confidence interval included true
# sample occupancy value
# stderr.real – vector of nreps; standard error for sample occupancy estimate using eq 4.8
# stderr.delta - vector of nreps; standard error for sample occupancy estimate using delta
# method variance for p*
# stderr.exp – vector of nreps; standard error for popn occupancy estimate
#
# Author: Jeff Laake
# Date: 14 Jan 2008
# Now also gives confidence interval coverage for realized occupancy in the sample
# to demonstrate std error for realized occupancy in which the inference is limited
# solely to the sampled sites which includes the case in which all sites in the
# population are sampled (eg. occupancy of a turtle sp in all ponds within a county)
#
do.occsim=function(nreps,use.popcov=TRUE,beta0=0,beta1=1,p=0.5,number.sites=5000,
number.sites.sampled=200,number.visits=5)
{
require(RMark)
# set up arrays and change number.visits to K to match book
cic=0
cic.sample=0
stderr.vec.real=rep(0,nreps)
stderr.vec.delta=rep(0,nreps)
stderr.vec.exp=rep(0,nreps)
K=number.visits
# loop over the number of replicate simulations requested
for (i in 1:nreps)
{
cat("\n rep =",i)
# generate occupancy covariate for the population of sites (Normal mean =0, sd=1)
site.var=rnorm(number.sites,0,1)
# if the variance is to be calculated for the population then construct a
# a histogram density for the realized covariates in the population for this replicate
if(use.popcov)
{
f.var=hist(site.var,plot=F,breaks=100)
p.var=f.var$counts/sum(f.var$counts)
nval=length(f.var$counts)
val.var=(f.var$breaks[1:nval]+f.var$breaks[2:(nval+1)])/2
}
# compute the probability that the site is occupied given it's covariate value
site.p=beta0+beta1*site.var
site.p=exp(site.p)/(1+exp(site.p))
# draw from the binomial (bernoulli) distribution a 0/1 for each site with the
# site-specific occupancy probability
site.occ=rbinom(number.sites,1,site.p)
# select a random set of sites without replacement
sites.sampled=sample(1:number.sites,number.sites.sampled,replace=FALSE)
# compute the mean occupancy for the population
pao=mean(site.occ)
# compute the mean occupancy for the selected sites
sample.pao=mean(site.occ[sites.sampled])
# compute an observed detection history for each site regardless of occupancy
ch=matrix(rbinom(number.visits*number.sites.sampled,1,p),ncol=number.visits)
# multiply by occupancy site such that unoccupied sites all end up as 000...0 ch
ch=ch*site.occ[sites.sampled]
ch=apply(ch,1,paste,collapse="")
# create a dataframe for RMark
occ.data=data.frame(ch=ch,site.var=site.var[sites.sampled])
occ.data$ch=as.character(ch)
# fit the appropriate model depending on value of beta1
if(beta1==0)
model=mark(occ.data,model="Occupancy",model.parameters=list(Psi=list(formula=~1),p=list(formula=~1)),output=F,retry=1)
else
model=mark(occ.data,model="Occupancy",model.parameters=list(Psi=list(formula=~site.var),p=list(formula=~1)),output=F,retry=1)
# extract values and compute std errors
p.est=summary(model)$real$p[1]
pstar=1-(1-p.est)^K
s=number.sites.sampled
su=length(ch[ch==paste(rep("0",K),collapse="")])
sd=s-su
Psi=sd/(pstar*s)
varp=summary(model,se=T)$reals$p$se[1]^2
# delta method std error; assumes constant p
stderr.vec.delta[i]=sqrt(Psi*(1-pstar)/(s*pstar)+(sd/s)^2*varp*(K*p.est^(K-1))^2/pstar^4)
# variance from book; assumes constant p
stderr=sqrt(sd*(1-pstar)/(s*pstar)^2+(Psi*(1-pstar)^2*K*p.est)/(s*pstar*(pstar*(1-p.est)-K*p.est*(1-pstar))))
stderr.vec.real[i]=stderr
# record conf interval coverage for the realized occupancy of the sample
if(sample.pao >= Psi-1.96*stderr & sample.pao <= Psi + 1.96*stderr)
cic.sample=cic.sample+1
cat("\n cic.sample =",cic.sample/i)
# compute variance for estimate of Psi for the population using either
# population values of covariate (use.popcov=TRUE) or the sample values;
# the latter will underestimate the variance as currently coded.
if(use.popcov)
{
cc=covariate.predictions(model,data.frame(site.var=val.var),indices=6)
stderr=sqrt(sum(outer(p.var,p.var)*cc$vcv))
pao.est=sum(cc$estimates$estimate*p.var)
}
else
{
cc=covariate.predictions(model,data.frame(site.var=site.var[sites.sampled]),indices=6)
stderr=sqrt(sum(cc$vcv))/number.sites.sampled
pao.est=mean(cc$estimates$estimate) # could also use estimate of Psi above
}
stderr.vec.exp[i]=stderr
# compute confidence interval coverage for estimate of population occupancy
if(pao >= pao.est-1.96*stderr & pao <= pao.est +1.96*stderr)
cic=cic+1
cat("\n cic.population =",cic/i)
}
return(list(cic=cic/nreps,cic.sample=cic.sample/nreps,stderr.real=stderr.vec.real,stderr.delta=stderr.vec.delta,stderr.exp=stderr.vec.exp))
}
cic=do.occsim(1000,number.sites.sampled=200)