www.phidot.org

by **howeer** » Thu Nov 29, 2007 4:48 pm

With spase closed captures data, models including trap responses and/or individual heterogeneity often yield unrealistic abundance (N^) estimates, even when all beta variances were positive. Unrealistic N^'s seem to be tied to p or c estimates near boundaries (i.e. one of two mixtures has p^ near 0, or there is an apparent trap-happy response such that c^ is high and p^ is near 0, leading to overestimation of N^, or apparent trap-shy responses lead to p near 0 and N^ = Mt+1 with apparent high precision. Using the retry argument in RMark to reset initial values didn't improve estimates from our data sets.
Such models may have low AIC values relative to simpler models that yield plausible estimates, even when parameters were counted correctly or adjusted upwards. The ultimate problem is likely that the data are inadequate support models including these effects. However, if one includes them in a candidate set and finds they have low AICc, how does one proceed with estimation? Does one infer 1) that the effect(s) is supported but data are inadequate to estimate parameters, and therefore that no reliable N estimate may be obtained unless an alternative estimator (i.e. Mhjackknife) for the same model is available? OR (2) that data are insufficient to test for and model the effect and then estimate N using a simpler model? Simpler models tend to produce estimates with wide CIs in these cases, which seems appropriate given sparse data.
Also, even one of these models in a candidate set can bias model averaged estimates. Is it appropriate to average across only a subset of the models in the a priori set? To model average at all? If we average across a subset, have we really included model selection uncertainty?
Sorry about the length.

by **darryl** » Thu Nov 29, 2007 5:21 pm

My thoughts are that any model selection procedure does not absolve you from having to make a judgment call at some point or exercising some common sense. Ideally you'll do this a priori by limiting your set of candidate models, but sometimes, particularly with sparse data, you may have to do that a posteriori. You just have to be honest about it and justify your reasons for excluding that model, even if it ranks highly. The fact you're getting biologically unreasonable estimate is, to me, good justification. It may be, that because of the sparse data then there's some aspect of the data that is making a particular model look really good even though it's not 'real'. It's a fine line between tossing results that are biologically unreasonable and results that you just don't like.

You also might want to consider what the results are implying, by having p-hat near 0 says that most animals are essential invisible to your sampling.

Cheers
Darryl

by **howeer** » Thu Nov 29, 2007 5:35 pm

Hi Darryl, thanks for your comments.
We have alot of single captures, which is to be expected with only 4 ocassions and fairly low overall p. If we also get a few animals caught 3 or 4 times, the null model doesn't fit well but there's little information to distinguish between a trap response, individual heterogeneity, or just an underlying null model and sampling error.

I expect the complex models will have to be excluded a posteriori, because some of our data sets are less sparse and can support complex models, and we want to use the same candidate set for all data sets. We'd only consider discarding models with estimates that are completely implausible (i.e. 100 X expected N^)
I'm still left wondering what kind of inference we can make after excluding models, especially from estimates averaged across a subset of candidate models.

by **cooch** » Thu Nov 29, 2007 5:41 pm

I concur with Darryl, with the following other observation. The premise of using AIC as a 'weight' for handling model selection uncertainty seems to rest in part on the notion that the candidate set of approximating models is derived a priori, and that the models themselves are chosen based on their biological plausibility. While most people seem to have 'bought into' this basic paradigm (by and large), for closed population estimation much of the time people seem to revert to classical data dredging (e.g., all-possible-models comparisons in CAPTURE) - simply fit a whole slew of models, with no apparent rationale for why the model is fit, and then average across models. Far too often I run into examples where models are fit with little -> no thought given to whether or not they're biological plausible. The advent of finite mixture models only exacerbates the ease with which this can be done, and it is fair to wonder how one could make a plausibility argument for a naive finite mixture model ('hey, lets see if one with 3 groups has a better AIC!'), at least in some cases.

So, beyond the point at hand, a general suggestion to think hard about the models that are fit - period - closed-estimation is no exception.

by **howeer** » Thu Nov 29, 2007 6:06 pm

Thanks Evan,
I agree about how its easy and common to casually model average across large candidate model sets. I suspect this is sometimes done when some of the models have parameters that aren't estimable (MARK allows this, RMark drops models or gives warnings). However, analysts are open to criticism for including or excluding models/effects from candidate sets. Both individual heterogeneity and trap responses have been documented previously in black bears, so we were reluctant to exclude such effects a priori even though we didn't expect trap responses with our sampling design (passive DNA sampling with minimal food reward) and some of our data sets were sparse. All models in the set are biologically realistic, we just don't always have the data to support them.

We had multiple data sets (areas across a large landscape). Trap responses were supported only in the sparsest data sets and led to unrealistic estimates. With more data, time + mixture models or special cases thereof were supported (some mixture models yielded unrealistic estimates, but I expected that). My current plan is to present no estimates from the sparse data sets with apparent trap responses, but also to fit time + mixture models (including null and other special cases) with and without group effects to pooled data from groups with more data (recaptures) and similar forms of p-variation. Care to comment on this approach?

by **darryl** » Thu Nov 29, 2007 7:24 pm

Eric,
If you have data from different areas, rather than analyze each individually (like it sounds you're doing at present) why not do them all at the same time? The advantage would be that you could fit models where some of the effects are the same across multiple areas. This could be particularly useful with sparse data to mitigate a particular model being ranked too high because of some random aspect of the data set from a single. This is all obviously based on the premise that you might expect something similar to be occurring across all areas (ie if there is trap response or heterogeneity, it is going to be of a similar nature across all areas).
Darryl

by **howeer** » Fri Nov 30, 2007 11:57 am

darryl wrote:Eric,
If you have data from different areas, rather than analyze each individually (like it sounds you're doing at present) why not do them all at the same time? The advantage would be that you could fit models where some of the effects are the same across multiple areas. This could be particularly useful with sparse data to mitigate a particular model being ranked too high because of some random aspect of the data set from a single. This is all obviously based on the premise that you might expect something similar to be occurring across all areas (ie if there is trap response or heterogeneity, it is going to be of a similar nature across all areas).
Darryl

I have analyzed the pooled data, but thought I should first see if within-group forms of p variation were similar. In the pooled analysis, the behavioural response was supported across all groups, as were group effects, and AICc support was evenly distributed among many models. I think the sparse data sets introduce model selection uncertainty to the pooled data set. I would like to exclude the sparse data sets, and the models with behavioural effects from the pooled analysis, based on the within-group GOF and model selection results that showed time + het models fit the data best when data were adequate to fit and select among models. That's the "judgement call" I want to make. Time + het or simpler models and similar p's among groups are biologically realistic with black bear DNA-capture data.

www.phidot.org

non-singular p's near boundaries, N^ unrealistic

non-singular p's near boundaries, N^ unrealistic

in addition

Who is online