Model selection when data are sparse

questions concerning analysis/theory using program MARK

Model selection when data are sparse

Postby simone77 » Thu Oct 27, 2011 7:20 am

I hope this question is not too trivial but this is something I often wonder about.
I am analyzing a CMR dataset on rabbits in an enclosure (see here for details). Each session a random portion of captured (new and old captures) individuals were blood sampled and their status (disease infected or disease not infected) was assessed.
It is a case of Partial Observation as described in Conn and Cooch 2009. I have applied the classical approach of data censoring and it resulted in a strongly reduced dataset as shown below:

Before data censoring
Image

After data censoring*
Image

*One session is missing because in one session no individuals were blood sampled

After modelling in MARK I get these two as my best models:
Image

whose parameters estimates are shown below:

1st model
Image

2nd model
Image

According to the AIC values and to LRT (Chi-sq=4.027 df=2 p=0.1335) I would say that psi differ among sexes but the estimate of one sex psi seems to be nonsensical (I guess due to data sparseness as, probably for the same reason, there are other estimates that are nonsensical).
I believe that in these cases I should consider that due to lack of data, simply there is no statistical power to detect some effects and the fact you don't find it does not mean necessarily that there is no effect.

Have you any advice or rule of thumb on how to "decide" in cases of data sparseness if an effect is non-detectable or simply does not exist?
Up to now I tend to see to the CI, to the estimates values (meaningless or meaningful) to get an idea but my criterion is perhaps too subjective, I wonder if you have any advice on this.

Simone
simone77
 
Posts: 200
Joined: Mon Aug 10, 2009 2:52 pm

Re: Model selection when data are sparse

Postby abreton » Mon Oct 31, 2011 12:43 pm

It seems tragic to throw-out so much information (data), why not apply the alternative models developed by Conn and Cooch (2009)? Conn and Cooch (2009) write,

One relatively common analytical approach when confronted with ambiguity in state determination is to censor all encounters where the state of an animal cannot be ascertained.


And their alternative,

Here, we present an alternative approach, which uses a hidden Markov (or multievent) modelling framework that can incorporate data from encounters of unknown state. Using simulation, we show that our approach leads to estimators of state-specific survival and transition probabilities that are more precise, and sometimes considerably so, than methods based on censoring.


After scanning the article, it appears that the authors fit their model using Programs E-SURGE and M-SURGE. Sometimes MARK can be 'tricked' by cleaver manipulators into building alternative models, i.e., model parameterizations that are not listed as available data types in MARK. However, Paul Conn and Evan Cooch are about as 'clever' as anyone using MARK, so perhaps your only option is M/E-SURGE. The heterogeneity you've mentioned may, perhaps in large part, be coming from state transitions. If you can figure out how to fit your data to this model the problem with phi4 and phi5 may go away, and you may get your most biologically plausible estimates of phi (as well as p and psi) to date.

Regarding your inquiries, those estimates that are estimated =1.0 appear to be legitimate, i.e., I think MARK is estimating these, without any problem, as being 1.0 with a small standard error in both cases. Similarly, parameter 14 also seems legitimate based on the SE (nothing unexpected here). A side note, if you haven't be sure to use the m-logit link function in the second model for the state transitions (see side-bar, page 10-22, MARK book).

Regarding effect size, see suggestions on page 4-62 in the MARK book. Section 6.12 discusses effect size in some detail. I think at the very least, if this was the 'end' of your analysis, that is you weren't going to build any more models, then you'd start by model averaging (Burnham and Anderson 2002, Chapter 4). Inferences from this point would be based on the unconditional estimates of the parameters. I'd likely plot the model averaged estimates and make inferences based on their similarity or differences, I'd implore readers to interpret differences with caution if the overlap was considerable. I may as well model average the betas for use in (log) odds ratios (MARK book, page 6-71, section 6.12.1).

Keep searching, I think you're getting close now to the most sensible data type/model parameterization given your data...Conn and Cooch (2009)?

andre
abreton
 
Posts: 111
Joined: Tue Apr 25, 2006 8:18 pm
Location: Insight Database Design and Consulting

Re: Model selection when data are sparse

Postby simone77 » Tue Nov 01, 2011 8:53 am

Very good suggestions!
Particularly, I will study again the 6th chapter of the GI because it might be quite useful to make things clearer.

Abreton wrote:It seems tragic to throw-out so much information (data), why not apply the alternative models developed by Conn and Cooch (2009)?


I couldn't agree more but, at least now, I am not able to. I think I would be able to make this same analysis in E-Surge in a few months (I will attend a workshop about this software in France), and in that case, I will post here to say how (and if) it has improved.

Simone
simone77
 
Posts: 200
Joined: Mon Aug 10, 2009 2:52 pm


Return to analysis help

Who is online

Users browsing this forum: No registered users and 3 guests

cron