www.phidot.org

by **Harpagus** » Thu Sep 24, 2009 3:13 pm

Hi folks,

I'm wondering if anyone is aware of any new recommendations for the situation where c-hat < 1. I suspect a c-hat < 1 is due to data sparseness - is this true?

Shall I set to 1 and hold my nose?

If I lower the lower boundary in MARK's median c-hat procedure to something < 1, I get an estimate of c-hat < 1.

I've seen a paper that includes a dataset similar to (and probably as sparse as) mine that used the default lower boundary of 1, but doesn't this preclude the discovery of a c-hat < 1?

Comments and advice appreciated!

Stefan

Website · by **dhewitt** » Thu Sep 24, 2009 4:06 pm

I admit that I haven't thought much about the issue of underdispersion (I work on fish and we more or less always deal with some overdispersion), but is it possible the model you are looking at (presumably your global model) has too much structure? When we find big c-hat values we suspect that there is structure we have missed -- some grouping(s) of fish that cause unaccounted-for heterogeneity. It seems to me that you may be in the opposite situation and that trimming back the model structure could help (I'd suspect estimation and fitting problems with a 'big' global model and a sparse data set anyway).

by **ADoug** » Sat Sep 26, 2009 3:48 pm

Hi,
I am would like to expose my case in this matter of c-hat < 1:

I am running a CJS to four data sets (one of them is the pooling of all the data together). I used median c-hat GOF test for 2 of them, and I also get a c-hat < 1. Just as Stefan I used the lower boundary of 1, and I got a result of c-hat around 0,95. But as Stefan comment, I also lower the boundary in MARK's median c-hat procedure to something < 1 ( to 0,64) and got a c-hat as low as 0,45!

I also ran the Bootstrap GOF Test for the same data (5000 simulations) and what I realized is that if I use the deviance way to calculate c-hat (observed divided by the median of simulations) I also got c-hat < 1. That is because the deviance for the simulations are most (if not all) larger than the observed. But then if I calculate c-hat by the other way bootstrap permits (using the c-hat observed dived by the median of c-hat of the simulations) than I got c-hat a bit larger than 1 (as large as 1.57).

That is very confusing to me, anyone knows what it means if my bootstrap results get those excessive larger deviances than observed? I suppose that is what lowering the c-hat to less than 1…but what now? Should I be conservative and use the larger c-hat that I found to correct the lack of fit? If I do It changes the ranking for some of the data, having implications on the results.

I think I am not using a “big” global model, I am running GOF test for a TSM model (best ranked in the AIC paradigm), since TEST 3 of GOF RELEASE is significant and there is transients in the sample. I am using a p(.) phi(M2 ./.) or the most parameterized p(t) phi(M2 ./.) in some of the data sets, but they all show this same pattern described.

I would appreciate comments on this matter.
Thanks,
Alex

by **Harpagus** » Wed Sep 30, 2009 9:45 am

I can expound a little on my situation:

1. Data are too sparse for a global model including time. I have no a priori reason to suspect year is a huge factor; I am most interested in generating Phi and p estimates for age and sex classes (which are the only 2 factors to include).

2. Adjusting model rankings using a c-hat < 1 does change things, but not a whole lot. Basically goes from having 2 best models with more or less equal support to having one of those 2 ranked best, with second falling ~3 AIC units from the best one. Parameter estimates more or less the same, based on model averaging.

3. Phi and p estimates are basically useless (huge 95% CI) for a few groups. But estimates for other groups appear reasonable. I'm thinking in a paper one could point out the difficulties of generating estimates for other groups. In fact, the differential resightability of some groups reveals something of interest about their ecology.

4. Is this a valid approach? I know one is supposed to do the GOF test on the most parameterized model with few estimation problems. But the options appear to be (1) reduce your general model to exclude factors of interest, in which case I'm basically SOL, or (2) accept that some parameters will be poorly (basically not) estimated. Naturally, I'm leaning towards 2.

I would greatly appreciate additional comments.

Stefan

Website · by **dhewitt** » Wed Sep 30, 2009 11:52 am

Your information in point #2 makes me think you don't have too much to worry about. BUT, you need to decide how to deal with those problem estimates with huge CIs. Are you using those in model-averaging? Are they on a boundary (0 or 1), and is that estimate reasonable (e.g., survival really 100%)? How many of the estimates are affected by this? Someone might correct me on this, but I think that since all of the parameters are linked up in the likelihood, if a bunch go awry others might be affected as well.

by **Harpagus** » Wed Sep 30, 2009 5:58 pm

That's pretty much what I would like advice on: even though some parameter estimates are awful, do they affect other parameter estimates?

The estimates are along the lines of what one might expect (lower in some age/sex classes). But the 95% CIs span 0.1- >1 for Phi and 0.1 - 0.9 for p.

I'm fine with accepting those as useless, just curious if my other estimates are as well-estimated as they seem.

Thanks!

Stefan

by **cooch** » Wed Sep 30, 2009 6:55 pm

Harpagus wrote:That's pretty much what I would like advice on: even though some parameter estimates are awful, do they affect other parameter estimates?

The estimates are along the lines of what one might expect (lower in some age/sex classes). But the 95% CIs span 0.1- >1 for Phi and 0.1 - 0.9 for p.

I'm fine with accepting those as useless, just curious if my other estimates are as well-estimated as they seem.

Thanks!

Stefan

The two-sided 95% confidence interval on c (as reported by the median c-hat procedure) is obtained by picking off the 0.025 and 0.975 probability values from the logistic regression function. In addition, because the lower confidence bound on c is often less than 1, a one-sided 95% confidence bound is also provided. This value is probably of more general value than the two-sided interval, given that c has a lower bound of 1.

Now, what is important to understand here is that MARK generates values for c-hat used in the median c0hat approach by proposing that the 'lack of fit' is entirely extra-binomial (translation: degrees of non-independence amongst individuals). This is *all* it does. This is pointed out in Chapter 5. It is also pointed out in chapter 5 that if your lack of fit isn't extrabinomial, then applying the estimated c-hat is questionable.

Moreover, values of c-hat <1 suggest under-dispersion. You have to work pretty hard to come up with plausible biological arguments for under-dispersion.

So, as per various bits in chapter 5:

1. if there is no plausible biological rationale to believe that there is underdispersion, then set c-hat =1 if the estimate c-hat is <1.

2. you shouldn't set the lower design point for the median c-hat routine <1 (even though you can). In fact MARK can't simulate data for c-hat <1, so you're not actually simulating <1 anyway (although MARK doesn't let you know this).

3. the whole issue of lack of fit relates to (i) is your model structure OK, if so, then (ii) is there a reason to expect extrabinomial noise (overdispersion)? If so, then estimate c-hat - the median approach seems to work pretty well. If you think there is lack of fit that isn't extra-binomial, there isn't a lot you can do. Good news, I suppose, is that most organisms have some level of extra-binomial noise (since for many taxa there is some level of non-independence).

4. don't forget that the estimate of c-hat is just that, an estimate - look at the CI, and the upper bound. Think about what that means.

Website · by **dhewitt** » Thu Oct 01, 2009 1:23 pm

Alex -- I have found that it is not unusual to get very different results for a c-hat estimate from the three methods of GOF assessment for CJS models (RELEASE, Bootstrap [2 methods], and Median c-hat). I have had c-hat range from 1.2 to 2.6 among methods for the same model, and of course model selection and parameter SEs are much affected between these endpoints. I go with median c-hat, but agree that this can be confusing and annoying. Median c-hat in my case is typically the lowest estimate, but I don't think it has to go one way.

Stefan -- I can't answer for your particular situation, but we often have 1-4 estimates out of 20-some that boundary in a CJS model. The boundary is at 1.0 for Phi, which is actually very possible for us. [I'm still not sure whether you are dealing with boundary estimates or just big SEs. Sorry if I missed that in what you've written.] We avoid those estimates in model-averaging (or don't model-average when the weight is almost all behind one model in the set), and look at various model outputs to see if the estimates are affected in less-parameterized models where the estimates do not boundary. When I am really concerned, I also run the MCMC routine for the model and look at the posterior dists for the troublesome parameters. Again, my tentative conclusion from what you've said is that you are OK. If the other parameters are well-estimated (low SE) and don't vary amongst models very much, go with it.

Evan -- I think Stefan was referring to the CIs on the actual model estimates rather than c-hat. And unless I have been missing something for a long time, I don't see a CI on c-hat anywhere in the output - just a SE. Right? I've never thought too hard about the SE since it is affected by the number of points/replicates you choose (since it's based on simulations). I check to be sure it isn't terrible, but it is almost always small for me. Am I missing something here? And, good tip not to let c-hat simulate below 1. Why doesn't Gary just not allow that?!

by **cooch** » Thu Oct 01, 2009 2:54 pm

dhewitt wrote: And unless I have been missing something for a long time, I don't see a CI on c-hat anywhere in the output - just a SE.

SE has been there for a while (from the outset, as I recall), and could be used to generate a CI in the usual fashion.

However, Gary has tweaked it recently so that it is now a more robust CI. He has also added the one-tailed upper bound.

Why doesn't Gary just not allow that?!

I suspect that restriction will be implemented in the next release. It needs to be implemented in both the median c-hat routine, and in the simulation routines.

The basic point that is in play (either in this thread, or another) is that c-hat<1 is probably not meaningful *biologically*, so you might as well make c-hat 1.0 (which, given the relationship between c and model df and model chi-square, is as low as you can get theoretically anyway). If by some fluke you actually have underdispersion, making c-hat 1.0 will simply make your analysis somewhat more conservative.

Website · by **dhewitt** » Thu Oct 01, 2009 3:12 pm

Geesh -- it seems there is a new version every time I check, about once a month! We need Tweets to announce new MARK version releases.

www.phidot.org

any new recommendations for c-hat < 1?

any new recommendations for c-hat < 1?

Underdispersion

Contradictory values of c-hat

c-hat < 1

Underdispersion

underdispersion

Re: underdispersion

Underdispersion, c-hat, and problem estimates

Re: Underdispersion, c-hat, and problem estimates

New MARK version

Who is online