Data sparseness and residual plots

questions concerning analysis/theory using program MARK

Data sparseness and residual plots

Postby jbauder » Fri Feb 10, 2012 12:59 pm

I know this post is similar to an earlier post but I'm not sure the response to that post adequately addressed my particular situation.
I am continuing to analyze a rattlesnake mark-recapture data set with very sparse data. The data was collected over 14 years (32 sampling occasions) and has 32 groups to model the effects of age, den, sex, and season (winter vs. summer). I have no individual covariates. My recapture probability is low (0.05 - 0.20, averaging 0.10). My data are too sparse to fit a model with either an interactive or additive effect of time so my global model includes an age*den*sex*season interactive effect. Using a bootstrap GOF, c-hat is about 1.5, and this is consistent with other similarly structured models. Normally I would be fine using this value of c-hat with QAIC, since I cannot think of any other biological reason for overdispersion and my data will support a more complex model. However, my deviance residual plot is very asymmetrical. The points above the zero line are uniformly distributed but there is a large cluster of points just below the zero line. I checked the residual output and I have a lot of encounter histories with observed values of zero and expected values between zero and one. This appears to simply reflect my low recapture rate as there are many (most) encounter histories that were never observed in most groups. This pattern is consistent with even very simple models (e.g., phi(den)p(den)) or models using only a subset of the total data set.
Would simply proceeding with the analysis using QAIC be acceptable since my "global" model has good fit based on the bootstrap GOF? All of the parameter estimates I have gotten in this analysis are what I would have expected to get and consistent with the literature. I cannot think of ways to further simplify my data since my simple models have this same issue. The data is too sparse to run TSM models for all groups. Do I need to reduce the number of groups in the .inp file? Or is this data too sparse for any CJS analysis?

Thank you very much for your help
Javan
jbauder
 
Posts: 56
Joined: Wed May 25, 2011 12:01 pm

Re: Data sparseness and residual plots

Postby cooch » Fri Feb 10, 2012 1:33 pm

jbauder wrote:I know this post is similar to an earlier post but I'm not sure the response to that post adequately addressed my particular situation.
I am continuing to analyze a rattlesnake mark-recapture data set with very sparse data. The data was collected over 14 years (32 sampling occasions) and has 32 groups to model the effects of age, den, sex, and season (winter vs. summer). I have no individual covariates. My recapture probability is low (0.05 - 0.20, averaging 0.10). My data are too sparse to fit a model with either an interactive or additive effect of time so my global model includes an age*den*sex*season interactive effect. Using a bootstrap GOF, c-hat is about 1.5, and this is consistent with other similarly structured models. Normally I would be fine using this value of c-hat with QAIC, since I cannot think of any other biological reason for overdispersion and my data will support a more complex model. However, my deviance residual plot is very asymmetrical. The points above the zero line are uniformly distributed but there is a large cluster of points just below the zero line. I checked the residual output and I have a lot of encounter histories with observed values of zero and expected values between zero and one. This appears to simply reflect my low recapture rate as there are many (most) encounter histories that were never observed in most groups. This pattern is consistent with even very simple models (e.g., phi(den)p(den)) or models using only a subset of the total data set.
Would simply proceeding with the analysis using QAIC be acceptable since my "global" model has good fit based on the bootstrap GOF? All of the parameter estimates I have gotten in this analysis are what I would have expected to get and consistent with the literature. I cannot think of ways to further simplify my data since my simple models have this same issue. The data is too sparse to run TSM models for all groups. Do I need to reduce the number of groups in the .inp file? Or is this data too sparse for any CJS analysis?

Thank you very much for your help
Javan



Few quick points:


1\ Data sparseness per se is not likely to contribute to strong asymmetry in the residual plot. Typically, such asymmetry reflects structural lack of fit. The TSM structure is a common culprit, but your data are suggested to be too sparse to fit those models. Given your description of the data, and given that said data is not likely to let you try more elegant approaches to 'solving structural lack of fit', you'll have to accept there are real limits to the inference(s) you can make. I'd start by collapsing things down as much as possible -- over groups, pooling over occasions if need be. If you can get 'decent fit' that way, you might be able to start parsing things.

2\ Also, with that many levels in your interactions, you're very likely (verging on certain) to have some strong interactions which will be difficult to parse. You might then pre-emptively decide to build models partitioning some of the interactions in the first place. So, run a model set 'by sex', for example.

3\ with a lot of sparseness, I'd be suspicious of the estimated c-hat. Try a median c-hat approach, to see how it works, relative to the value from the bootstrap (which is know to generally be biased).
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: Data sparseness and residual plots

Postby jbauder » Fri Feb 10, 2012 2:11 pm

Thanks for your quick response. I have a few things I would like clarify about your suggestions

1\ Our capture occasions occurred during the spring and fall of each year as the snakes were coming out or going under. However, I did not pool these two occasions because they are separated by several months and mortality could occur during the summer and winter. But given that my data is so sparse, would an acceptable next step be to condense my two spring/fall capture occasions into one annual occasion? I did not do this initially because mortality could occur over the summer as well as over the winter. But, like you mentioned, this may be a limitation I need to work with.
2\ I have run models without the interactions among groups (e.g., phi(sex)p(sex)). These models still show the same pattern on the residuals. Were you referring to something else when you suggested running models without interactions?
3\ I will give this a try this weekend. Based on how long it took the bootstrap GOF to run, I think estimating median c-hat will be my justification for a newer, faster computer.

Thanks again
jbauder
 
Posts: 56
Joined: Wed May 25, 2011 12:01 pm

Re: Data sparseness and residual plots

Postby ehileman » Sat Feb 11, 2012 10:09 pm

jbauder wrote:Thanks for your quick response. I have a few things I would like clarify about your suggestions

1\ Our capture occasions occurred during the spring and fall of each year as the snakes were coming out or going under. However, I did not pool these two occasions because they are separated by several months and mortality could occur during the summer and winter. But given that my data is so sparse, would an acceptable next step be to condense my two spring/fall capture occasions into one annual occasion? I did not do this initially because mortality could occur over the summer as well as over the winter. But, like you mentioned, this may be a limitation I need to work with.
2\ I have run models without the interactions among groups (e.g., phi(sex)p(sex)). These models still show the same pattern on the residuals. Were you referring to something else when you suggested running models without interactions?
3\ I will give this a try this weekend. Based on how long it took the bootstrap GOF to run, I think estimating median c-hat will be my justification for a newer, faster computer.

Thanks again


Hi Javan,

In my experience, snakes generally yield low recapture probabilities. With the low recapture probabilities you are observing, I suspect that you might need to collapse your within year sampling occasions to annual occasions as Evan suggested, but this shouldn’t be an issue with relatively long-lived animals like rattlesnakes. Additionally, interactive terms are often difficult if not impossible to get at with sparse data, especially if you have many interactions. By pooling data within years and reducing the number or eliminating your interactive terms (perhaps consider using some additive terms instead), do you think you could run TSM models?

Just out of curiosity, what is the N/K for your global model? Are all of the parameters estimable for you global model? I would be surprised if they are with sparse data if K is large. In addition to using the biased bootstrap GOF, I wonder if inestimable parameters in your global model could be giving you an unrealistically low overdispersion estimate. Regarding your asymmetrical plots, does the asymmetry correspond with communal den captures? Aggregations of snakes, like those found in communal hibernacula, are known to exhibit a lack of independence. Food for thought.

Cheers!
Eric
ehileman
 
Posts: 51
Joined: Sat Nov 26, 2011 6:40 pm
Location: West Virginia University

Re: Data sparseness and residual plots

Postby jbauder » Tue Feb 21, 2012 2:17 pm

I have tried collapsing my sampling occasions from a fall and a spring occasion into a single annual occasion and tried reducing the number of groups to include a single effect (i.e., just den or just age). This has seemed to solve the problem of asymmetrical residual plots. My data still appear too sparse to use fully time dependent models. My recapture rates are typically around 0.10 - 0.30. However, if I use an additive effect of time, I can estimate all of my parameters and so I have used a phi(group+time)p(group+time) model as my "global" model. The c-hat values for these global models (both median c-hat and GOF) are around 1.10-1.20 so I would use QAIC to correct for overdispersion.
However, I now have the issue that I cannot directly compare an effect of age, den, or sex, and I am not sure which factor is more influential or has the strongest effect on survival and recapture rate. If I combine any of these groups in my .inp files (i.e., have groups for both den and sex) my deviance residual plots become asymmetrical again, even though median c-hat estimates are moderately overdispersed, 1.20 - 1.50). I understand that I am limited by this data set but I would like to know if there is a way I can compare the strength of the effect of den, age, and sex to know which factor is "best supported" by the data.
Can I compare QAIC values among these three groups of models? For example, could I compare models using the .inp file that only has groups for age with models from the .inp file that has only groups for den? The actually encounter histories will be exactly the same but the number of groups will differ, as will the number of snakes in each group. I have compared the output file for the phi(.)p(.) model for each group of models and the -2log likelihood is exactly the same but the deviance, AIC values, and degrees of freedom are different. Is this a case of comparing apples to oranges?

Javan
jbauder
 
Posts: 56
Joined: Wed May 25, 2011 12:01 pm

Re: Data sparseness and residual plots

Postby jbauder » Tue Feb 21, 2012 2:25 pm

Hi Eric
Thanks for your post. As I wrote in my latest reply, I did try pooling data within years and reducing the number groups and that seemed to solve the issue of asymmetrical residual plots. My "global models" (that will estimate all my parameters) now have about 20-22 parameters (depending on what groups I put in the model) and my effective sample size is 3602. Big, but apparently the low recapture rates are a little too low. I am not sure if captures at communal hibernacula are causing any issues with the residual plots. The condensed data I have a little overdispersion (c-hat >1.10) which could be caused by lack of independence but I think the issue with the residual plots was caused by so few recaptures compared to the number of groups I was using.
Javan
jbauder
 
Posts: 56
Joined: Wed May 25, 2011 12:01 pm


Return to analysis help

Who is online

Users browsing this forum: No registered users and 1 guest