Median c-hat v Bootstrapping GOF

questions concerning analysis/theory using program MARK

Median c-hat v Bootstrapping GOF

Postby Matt Stevens » Tue Jun 16, 2009 5:33 pm

Hi all,

I'm attempting survival modeling for a whole suite of African passerine species using MARK (using retrap only data). Some of the species have few retraps, others many. My problem lies with the GOF tests, in that for species with few retraps I'm getting decent c-hats (i.e. <3 using Bootstrapping) but species with more data produce c-hats >4. That in itself doesn't concern me too much but if I run GOF tests using the median c approach, my c-hats very rarely resemble those obtained using the bootstrapping method. e.g. for one species, testing a very basic s(.)p(.) model I get c-hat of 7.02 using bootstrapping but 1.19 using median c!
I've generally been running the median c method using 1-3 as the min and max bounds.

I've RTFM many times now and assessed and re-assessed my data but still can't resolve this. Any thoughts would be very welcome.

Cheers,

Matt
Matt Stevens
 
Posts: 2
Joined: Sat Feb 10, 2007 9:39 am
Location: Univ St. Andrews, Scotland & APLORI, Nigeria

Re: Median c-hat v Bootstrapping GOF

Postby cooch » Tue Jun 16, 2009 6:25 pm

Matt Stevens wrote:Hi all,

I'm attempting survival modeling for a whole suite of African passerine species using MARK (using retrap only data). Some of the species have few retraps, others many. My problem lies with the GOF tests, in that for species with few retraps I'm getting decent c-hats (i.e. <3 using Bootstrapping) but species with more data produce c-hats >4. That in itself doesn't concern me too much but if I run GOF tests using the median c approach, my c-hats very rarely resemble those obtained using the bootstrapping method. e.g. for one species, testing a very basic s(.)p(.) model I get c-hat of 7.02 using bootstrapping but 1.19 using median c!
I've generally been running the median c method using 1-3 as the min and max bounds.

I've RTFM many times now and assessed and re-assessed my data but still can't resolve this. Any thoughts would be very welcome.

Cheers,

Matt


1. don't bother with bootstrap. Use either RELEASE (if your general model is fully time-dependent) or the median c-hat (for that model and anything else).

2. you need only run the GOF on the most general model in the candidate set of approximating models.

3. the median c-hat estimates c on the assumption that lack of fit is due entirely to extra-binomial noise. If that isn't the case (e.g., if the model structure is inappropriate for your data), the estimated c-hat won't mean much.

4. most of the time, if c-hat is estimated >3, the problem is an interaction of (i) sparse data and/or (ii) structural problems (i.e., wrong general model). If (i), then you use a slightly less parameterized general model. If (ii), you need to think hard about what might be sources of variation in your data. Often, fitting a TSM model solves a multitude of problems (since even a single TSM class can help soak up some heterogeneity).

5. If you're stuck with sparse data, and can't figure out a possible structural problem, then there isn't much you can do - you'll be stuck fitting very simple models.

All of the preceding is in the GOF chapter.
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Median c-hat v Bootstrapping

Postby Matt Stevens » Tue Jun 16, 2009 7:44 pm

Yes, thanks for that, the GOFs were only performed on the general models.

1. I started with fully time-dependent but, as you've mentioned in point 4, my data IS generally too sparse and I've had to go for much less parameterized general models. And yes, using a TSM approach is where I'm at and generally seems the most suitable.

2. I used the bootstrap to back up the median c-hat because of the mention in the GOF chapter of the median c-hat being a work in progress. As my data are too sparse to get anywhere with a fully time-dependent model I didn't think using RELEASE was a valid option (too many cells with too little data).

I'm getting negative c-hats on occasion for certain species so it seems there's a more deep-rooted structural problem for these.

Thanks again for the advice.
Matt Stevens
 
Posts: 2
Joined: Sat Feb 10, 2007 9:39 am
Location: Univ St. Andrews, Scotland & APLORI, Nigeria

Re: Median c-hat v Bootstrapping

Postby cooch » Tue Jun 16, 2009 7:53 pm

Matt Stevens wrote:Yes, thanks for that, the GOFs were only performed on the general models.

1. I started with fully time-dependent but, as you've mentioned in point 4, my data IS generally too sparse and I've had to go for much less parameterized general models. And yes, using a TSM approach is where I'm at and generally seems the most suitable.


If its passerines, TSM models almost always help.

2. I used the bootstrap to back up the median c-hat because of the mention in the GOF chapter of the median c-hat being a work in progress. As my data are too sparse to get anywhere with a fully time-dependent model I didn't think using RELEASE was a valid option (too many cells with too little data).


All correct. All GOF tests are works in progress, since there is not perfect 'tool'. Of the bunch, median c-hat seems to have the best performance, and should be used in preference to anything else. It can be slow, but you only need to do it once.

I'm getting negative c-hats on occasion for certain species so it seems there's a more deep-rooted structural problem for these.

Thanks again for the advice.


If you understand what the median c-hat is doing, then you'll understand what might cause a 'negative' c-hat (clearly, the logistic model being fit to the frequencies is way off in those cases).

If you have lots of heterogeneity, and sparse data, then as per earlier note, there might not be much you can glean from some of your data. No amount of 'MARK voodoo' will change the inference limits due to 'lousy' data.
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University


Return to analysis help

Who is online

Users browsing this forum: No registered users and 2 guests