Bootstrap to average across levels of factor

posts related to the RMark library, which may not be of general interest to users of 'classic' MARK

Bootstrap to average across levels of factor

Postby jCeradini » Fri Mar 10, 2017 6:48 pm

Howdy,

Re: estimate average parameter value, with CI's, across levels of a factor (group), such as the average phi in a model with a time effect on phi (e.g., section 6.15 in MARK book).

I know getting such estimates via variance-components (random effects) and mcmc is discussed in the MARK book (6.15 and appendices) and on some threads here. But, I'm wondering if a bootstrap approach is also valid and/or equivalent. My example below suggests that it is but I'm hoping to get feedback on if it is generalizable (word?), or just happens to work in this case. This question is somewhat academic, I guess, since clearly I could just use var.components, but I sometimes find a bootstrap easier to conceptualize and implement (besides the obvious downside of processing time).

My thinking was that the average of the mean parameter estimates for each group is a valid estimate of the overall average (right?), but the average of the SE's is not a valid estimate of the average SE. So, bootstrap to get a distribution of averaged means and use that distribution to estimate CI's for the average.

I compared the RMark var.components example with the dipper data to a bootstrap approach:
Code: Select all
# RMark var.components example
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# This example is excluded from testing to reduce package check time
data(dipper)
md=mark(dipper,model.parameters=list(Phi=list(formula=~time)))

zz=get.real(md,"Phi",vcv=TRUE)
zz$estimates$estimate
z=zz$estimates$estimate[1:6]
vcv=zz$vcv.real

varc = var.components(z, design = matrix(rep(1, length(z)), ncol=1), vcv)
varc


# Bootstrap approach
#~~~~~~~~~~~~~~~~~~~~~~~~~~
reals.ave.all <- vector() # empty vector to hold loop output

# LOOP
for(i in 1:1000){
  # Resample dipper DF with replacement
  boot.df <- dipper[sample(nrow(dipper), nrow(dipper), replace=TRUE), ]
 
  # Fit mod on resampled DF
  md = mark(boot.df, model.parameters = list(Phi=list(formula = ~time)), output = FALSE)
 
  # Extract unique reals and average means across time
  reals = get.real(md, "Phi", vcv=TRUE)
  reals = reals$estimates$estimate[1:6]
  reals.ave <- mean(reals)
 
  # Combine each iteration as go
  reals.ave.all <- rbind(reals.ave.all, reals.ave)
 
  # Unsophisticated progress "bar" so don't wallow in uncertainty
  print(i)
}

# Compare boot and var.components estimates
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Means
mean(z) # observed mean of time means
#[1] 0.5623417

varc$beta$Estimate # mean from var.components
#[1] 0.5593609

mean(reals.ave.all) # mean of distribution of averaged means (ah!) from bootstrap
#[1] 0.5647425


# SE's
varc$beta$SE # SE from var.components
#[1] 0.03042626

sd(reals.ave.all) # SE from boostrap
#[1] 0.03010916


# CI's
varc$beta$Estimate - (varc$beta$SE*1.96) # LCL from var.components
#[1] 0.4997255
varc$beta$Estimate + (varc$beta$SE*1.96) # UCL from var.components
#[1] 0.6189964

quantile(reals.ave.all, c(0.025, 0.975)) # CI's from bootstrap via "percentile" method ("bias-corrected" would be better)
#2.5%        97.5%
#0.5038039  0.6220677


Thanks!
Joe
jCeradini
 
Posts: 74
Joined: Mon Oct 13, 2014 3:53 pm

Re: Bootstrap to average across levels of factor

Postby jlaake » Tue Mar 14, 2017 10:52 am

Not sure this is the best subforum for this question. Even though it uses RMark for the analysis it is a question about an analytical approach. I have a few thoughts here. First, the bootstrap doesn't give you the main goal of the variance components approach which is an estimate of process variance. Second, while the bootstrap may work in this case and can seem conceptually straightforward, it is easy to perform the bootstrap incorrectly. For example, I would not have re-sampled across all histories. Instead I would have resampled within release cohort such that the number per release remained constant. Finally, the better way to approach this is with a mixed-effects modelling in which you estimate a mean and process variance as part of model fitting. I have implemented this in the marked package for CJS and I'm currently working on Template Model Builder (uses TMB R package) versions of CJS and multistate models. You may be able to do the same thing with MCMC in MARK.
jlaake
 
Posts: 1480
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: Bootstrap to average across levels of factor

Postby jCeradini » Tue Mar 14, 2017 12:31 pm

Thanks for the response, Jeff! Yea, I regretted posting to the RMark subforum afterwards - 'analysis and design questions' would have been better. I was lured to this subforum by my RMark code.
Would it be worth "transporting" (Evan?) this thread to 'analysis and design questions'?

Excellent point regarding what to resample. If they were my data, that I was more familiar with, I likely would have resampled within release cohort, for example. But the point is certainly taken that how to resample is an important choice that could be done incorrectly.

The bootstrap doesn't provide an estimate of the process variance so, as you said, the goal is different than when using var.componenets. But, at least, the SE's and CI's are not biased low like they would be if we simply excluded time from the model and used the estimates from the constant model, which are:
estimate: 0.560243, se: 0.02513295, lcl: 0.5105493, ucl: 0.6087577

vs.

Code: Select all
# SE from var.components
#[1] 0.03042626

# SE from boostrap
#[1] 0.03010916

# LCL and UCL from var.components
# 0.4997255, 0.6189964

# CI's from bootstrap via "percentile" method ("bias-corrected" would be better)
#0.5038039, 0.6220677

Not a big difference in estimates here, but I imagine it could be more exaggerated depending on the data. I also estimated the CI's via the bias-corected percentile method (p. 48 in Randomization, Bootstrap and Monte Carlo Methods in Biology - Manly), which returns results that are notably different than any of the estimates shown here. But I need to double check that I'm implementing it correctly before throwing those estimates into the mix.

I agree random effects makes more sense, and am excited that more models are being implemented in marked!

Thanks again.
jCeradini
 
Posts: 74
Joined: Mon Oct 13, 2014 3:53 pm


Return to RMark

Who is online

Users browsing this forum: No registered users and 2 guests