GOF suggestions sought

questions concerning analysis/theory using program MARK

GOF suggestions sought

Postby birdman » Wed Aug 04, 2010 11:01 pm

I am trying to figure out a way to estimate c-hat. My data (described below) do not fit the CJS assumptions because I know I have age effects on both survival and recapture. Young fledglings have lower survival and recapture probabilities than older ones. I have sought to use an age based model (or age+year due to substantial yearly variation) as my general model because, regardless of when they fledge, the previous statement is true. However, I found that median c-hat and bootstrapping do not work on encounter histories with “.” values. What to do? If I replace the nulls with zeros the information in my data changes substantially, the best fitting models change, and the estimate of c-hat in the standard model output (which I know isn’t reliable) nearly doubles to around 11-12. I’ve read and reread the Gentle Intro, but I’m not finding a suitable workaround. Also, since there is substantial variation across years, this may be important in the general model. I’m assuming that to include it in something like median c-hat, I’ll need to use years as groups, since models with individual covariates aren’t used in median c-hat process.

Thanks for any input. I’ll gladly clarify or answer other questions if it will help. Here is some additional hopefully helpful information.

I am attempting to model weekly survival of banded fledgling birds based on weekly observations. I have a large, multiyear dataset from a population where all nests are found and all nestlings are banded on day 11 post hatching. The birds hatch on d17-18, and are resighted weekly until they reach nutritional independence at about 85 days of age.

I am using a live recaptures data type. I only want to model survival really. The birds don’t move off-site during the period of interest. They are increasingly visible after the first couple of weeks, and they do not transition to another state until the end of this period.

Historically, weekly monitoring was curtailed in the middle of July when most birds had reached independence, so for some birds I have their whole encounter history, while for later fledging individuals I only have the first weeks. I have set up the data such that individuals enter the banded population on their day of banding and spend one week in the nest (the first survival parameter is set to one, since only birds fledging are considered in the analysis). Data across years are lumped by the calendar week; that is, each year some birds enter in week 15, some in week 16, etc.

Each bird’s encounter history then is 1=seen in a given week, 0=not seen, “.”=no data (birds for which the historical data don’t say for certain that a search took place for a given week). I was told that the live recaptures data type would now handle null (.) values, and that this was more appropriate than filling zeros. This usually results only when birds were reasonably known to have died but occasionally there are gaps when a week was missed for some reason. The earliest birds enter about week 14-15, the latest around week 24-25. I have about 1500 records of individual fledglings across 14 years currently, with data going back much further which isn’t yet available in electronic format.
birdman
 
Posts: 34
Joined: Wed Oct 24, 2007 4:14 pm

Re: GOF suggestions sought

Postby ToddArnold » Thu Aug 05, 2010 6:24 pm

Can you append a couple lines of data from your .inp file with an explanation of what each capture history means?

It sounds like your encounter histories are structured based on time (e.g. ....11000 is a bird fledging in week 6 and never seen again) rather than age (all encounter histories start with a 1 representing the nestling week, e.g. 1100000000). For nests or offspring, where you're looking at daily or weekly survival, I like structuring encounter histories based on age and pulling in temporal variation with covariates, rather than the converse, and if you do this you wouldn't need to use periods for left-censored observations because every capture history would begin with a 1.

You shouldn't censor out birds that you know have died, unless you caused their death (unless you're using a Barker or Burnham model to identify when they died). They should have a terminal string of zeros, not periods. And if you didn't encounter a 3-week old fledgling because you didn't search that week, that's still a zero, not a period (if you want to discriminate between bird-related vs. search-related detection failure, assign some occasion specific covariate that measures search effort). You really only need periods in the encounter histories to account for birds that were right censored because your field season ended before they reached the "terminal age" (85 days?) at which you're monitor post-fledging survival. Assuming that's a small subset of your data, I would run a GOF test (median c-hat) on the lion's share of your data which isn't right-censored. -Todd
ToddArnold
 
Posts: 3
Joined: Thu Aug 27, 2009 5:20 pm
Location: University of Minnesota

Re: GOF suggestions sought

Postby birdman » Fri Aug 06, 2010 8:47 am

Todd-

Thanks for your input and anything further you can provide. First, in response to a couple of your comments/questions:

You are correct that I’ve essentially structured this based on a temporal format, with birds only entering once they are banded, and periods preceding that. I had thought about doing it as you suggested, and through talks with my collaborators just decided on the current format. I think your approach might provide an easier path. Do you still use the live recaptures only data format? I don’t think I need any of the more complex data types for this stage, though I will as I start working on survival of birds from their first fall through adult life, including modeling breeding probabilities and such.

Regarding the use of periods, I was simply under the impression that this was more appropriate in cases where no data exist. Based on my knowledge of this study over the years preceding my arrival, I have every confidence that in most cases, all birds were searched for weekly until they were known to have died, and most territories were monitored for renest attempts, so filling with zeros is not really problematic. I was simply taking a conservative approach to effort expended. Also, part of the reason I chose to use a time structured encounter history was that I didn’t have the full EH for late season birds. Your suggesting of using periods for those birds, and running median c-hat on the uncensored birds provides a great solution. I should note that once we have all the historical data brought into our new database the right censoring will not be necessary because I’ll have resight information from mid-August for every bird in the population, and will be able to essentially close the bracket on all birds.

Thanks for your help. I should be able to move forward now, but I appreciate any comments you or others may have.

Here’s a sample of the (current).inp file with a few comments for clarity. You have a 15 occasion encounter history, a vector of “1” for the group, dummy variables for 13 of the 14 years, and weight at day 11 when banded (followed by ID information).

Two birds from SFLA97 territory. Both fledge at the earliest occasion and were searched for weekly through independence until secondary banding at independence, which occurred at the next to last occasion, thus the periods on the end. The first bird likely lived until the August census, and if I had that data would have a 1 in the final occasion. The second bird died after about 8 weeks out.
10010111111111. 1 1 0 0 0 0 0 0 0 0 0 0 0 0 51 3; /*1043-44220 ZWR- F SFLA 1997 */
10000111000000. 1 1 0 0 0 0 0 0 0 0 0 0 0 0 49.7 3; /*1043-44221 ZRn- U SFLA 1997 */

This single bird was banded during the second week, known to have fledged based on observations at the nest, but never actually seen again. I have definitive historical documentation on weeks 3 and 4 with no sightings, and no further info, even though I suspect this territory was visited regularly throughout the rest of the summer to monitor for renesting.
.100........... 1 1 0 0 0 0 0 0 0 0 0 0 0 0 44.3 4; /*1043-44239 ZFn- U CACT 1997 */

Finally, two birds from FENC04 showing a 3rd week banding, and the observation record afterward, noting that during two weeks I have no record that the sight was visited. Filling these with zeros is more acceptable?
..101.111111.11 1 0 0 0 0 0 0 0 1 0 0 0 0 0 35.2 3; /*1573-65980 CR-G M FENC 2004 */
..100.000000.00 1 0 0 0 0 0 0 0 1 0 0 0 0 0 36.5 3; /*1573-65981 CG-n F FENC 2004 */
birdman
 
Posts: 34
Joined: Wed Oct 24, 2007 4:14 pm

Re: GOF suggestions sought

Postby ToddArnold » Mon Aug 09, 2010 8:29 am

I'm guessing these are scrub jays. It sounds like the encounter histories will be of similar length (maybe a couple intervals shorter) if you structure based on age (1-12 weeks of age) as opposed to time (weeks 1-15 of the field season). If you stick with the format you currently have, I'd add an individual covariate for "age at week 1", and this will be negative for all the fledglings that don't enter until later (check out the Rotella's description of the Nest Survival module for similar examples - you'd assign age values of 1, 1, 0, -1, -1 for the 5 records you list). If you structure based on age so all EH records start with 1, then add a covariate to tell which week of the season they fledge in, so you can look for temporal trends (i.e., does fledging success declines seasonally?).

I'd treat years as attribute groups rather than dummy variables (although I can understand not wanting to start out with 28 14-column PIMs). It would be easier to consider annual covariates (including trend models) if you do. But you can still "get there from here" with the way you've got it coded.

If you want to better model the detection process (i.e. get an understanding of when detection failure is due to no survey occurring as opposed to failing to find a live bird), you could add occasion-specific covariates, one for every occasion in the encounter history, that are just dummy variables for "was a survey conducted?". So an encounter history like this 11011... matching up with occasion-specific covariates like this 1 1 0 1 1 ... would indicate that detection failure on occasion 3 is due to logistical considerations (no survey), whereas 1 1 1 1 1 ... would indicate something related to the survey procedures or bird's behavior. That might be worth doing if you wanted insight about how often you see a bird when you actually look for it, or to fit a tighter model of the detection process. The dummy variables for survey could all be stacked in a single column in the design matrix, where they'd generate a massively positive beta coefficient, and the intercept would be reduced accordingly. Set all the covariate values to 1 using "user-specified covariate values" and you'd have an estimate of the detection probability conditional on surveys being conducted.

Censoring decisions are always tough, especially when you're going back and retroactively interpreting someone else's data. For your 3rd data record, you have to ask "would this bird be in the encounter history if it had survived longer?" and the answer seems like "Yes" so you don't want to be censoring it because it probably died. Without knowing more, I'd code it as a string of terminal zeros.

Sometimes with live-encounter data you know a bird is dead, but you can't really do anything with it unless you adopt a Burnham or Barker formation. I've got CJS data on fledging success of 1050 coot chicks where I know that 50 of them died because I recovered their carcasses, but adding in the mortalities with a Barker formulation really didn't give me any better estimates of survival than just the simpler CJS model.
ToddArnold
 
Posts: 3
Joined: Thu Aug 27, 2009 5:20 pm
Location: University of Minnesota

Re: GOF suggestions sought

Postby birdman » Mon Aug 09, 2010 5:13 pm

Thanks for the feedback so far. This is really starting to erode my mental state... I restructured the data as you suggested, under an age format, with a covariate for fledge date to account for temporal effects. I also cleaned up the internal periods in the encounter histories to zeros and only right censured individuals coming into the population late season and lacking later sightings information, with the exception that individuals known to have died were coded with trailing zeros. These birds don’t move off the natal territory so if we don’t find them once they are a few weeks old (when resighting probability is very high), we can be reasonably certain they are dead. I also used year as attribute groups because I needed to be able to account for it in the general model due to obvious yearly variation.

I successfully ran a median c-hat and got an estimate of 1.388, much better than I expected given my initial attempts. However, when I look at the deviance residual plot, I’m shocked. It appears that MARK is plotting observed and expected values for every individual in every year. So, and individual in 1997 that was banded and never seen again (100000000000) has a plotted residual value not only for 1997 but for all other years as well, so that the “above zero” portion of the plot looks reasonable although there is some indication of structure, while the below zero portion has a huge amount of repeated structure.

Since I am attempting ultimately to assess support for various covariates besides fledge date and year, here are a couple of the full input records. Each bird has a 12 occasion history, a series of group indicators (these birds are all from 2010), 12 covariates, and an identifier. Is there something wrong with entering them in this way that would account for the bizarre residuals plot?
/*EncHist years97-10 Wgt Nst Hlprs JulFl Htch sprrain KBDI fr0 fr04 fr515 fr15+ frno Band */
111111111111 0 0 0 0 0 0 0 0 0 0 0 0 0 1 44.6 1 1 134 2 14.75 373 0 0 1 0 0; /* 1713-13654 */
11111111111. 0 0 0 0 0 0 0 0 0 0 0 0 0 1 38.9 1 1 135 3 14.75 377 0 0 1 0 0; /* 1713-13656 */
100000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 1 39.2 1 1 135 3 14.75 377 0 0 1 0 0; /* 1713-13657 */
100001000000 0 0 0 0 0 0 0 0 0 0 0 0 0 1 38.3 1 1 135 3 14.75 377 0 0 1 0 0; /* 1713-13658 */
11011111111. 0 0 0 0 0 0 0 0 0 0 0 0 0 1 51.6 4 1 140 2 14.75 399 0 0 1 0 0; /* 1713-13661 */
birdman
 
Posts: 34
Joined: Wed Oct 24, 2007 4:14 pm

Re: GOF suggestions sought

Postby cooch » Mon Aug 09, 2010 5:36 pm

birdman wrote:Thanks for the feedback so far. This is really starting to erode my mental state... I restructured the data as you suggested, under an age format, with a covariate for fledge date to account for temporal effects.

<snip>

I successfully ran a median c-hat and got an estimate of 1.388, much better than I expected given my initial attempts. However, when I look at the deviance residual plot, I’m shocked. It appears that MARK is plotting observed and expected values for every individual in every year. So, and individual in 1997 that was banded and never seen again (100000000000) has a plotted residual value not only for 1997 but for all other years as well, so that the “above zero” portion of the plot looks reasonable although there is some indication of structure, while the below zero portion has a huge amount of repeated structure.


Have a look at the figure on p. 47 of Chapter 11 (the individual covariates chapter). If in fact your .inp file (general model) contains individual covariates, then this figure probably looks familiar. In short (and as noted in that part of Chapter 11), deviance plots with models with one or more individual covariates are not informative.
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: GOF suggestions sought

Postby biaoceano » Wed Aug 11, 2010 9:19 pm

Hi, I have the same problem that you had in the beginning. I can´t run Median c-hat for my analysis. My original encounter history is like the example below:

100000000....0000000000000000000000000000000000.00000000000000000.0.00000000000000000000000000000......000000000000000000 1 0 45.7;
100000000....0000000000000000000000000000000000.00000000000000000.0.00000000000000000000000000000......000000000000000000 1 0 50.3;
010000000....0001000000000000000000000000000000.00000000000000000.0.00000000000000000000000000000......000000000000000000 1 0 47.9;
010000000....0000000000000000000000000000000000.00000000000000000.0.00000000000000000000000000000......000000000000000000 1 0 64.3;


I already made a new encounter history not considering the individual cov. and without . (nulls), but even like that something is going wrong, I can´t even run the Median c-hat.

new EH:

100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1 0 ;
100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1 0 ;
010000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1 0 ;
010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1 0 ;

The background of my research is:
I study sea turtles, and since 2001 a group is running monthly surveys. As you can see on the 4 1st lines of the Encounter history I have really low recaptures, fact that lead me to try different ways to analyze the data (Grouping the occasions in a six month period, and year period).

those 2 groups are Juveniles and adults, Juveniles are more likely to being captured then adults, due to adults usually migrate.

Every help and comment is more than welcome. thank you :D
biaoceano
 
Posts: 1
Joined: Fri Feb 05, 2010 3:09 pm

Re: GOF suggestions sought

Postby cooch » Fri Aug 13, 2010 6:32 pm

biaoceano wrote:I study sea turtles, and since 2001 a group is running monthly surveys. As you can see on the 4 1st lines of the Encounter history I have really low recaptures, fact that lead me to try different ways to analyze the data (Grouping the occasions in a six month period, and year period).

those 2 groups are Juveniles and adults, Juveniles are more likely to being captured then adults, due to adults usually migrate.

Every help and comment is more than welcome. thank you :D


Quick thoughts (verbatim copied from another thread).

1/ if p<0.15, then there are going to be significant limits to your inference. Period. There are some *big names* (who I will let self-identify if they wish) who will suggest that for p<0.15, you might as well give up - or, perhaps somewhat more optimistically, not expect much of interest. Further, quoting from Chapter 5 (GOF chapter of the MARK book):

...if you are sure your model structure is correct, and despite you’re finer efforts, your c-hat is  >3, or if your model rankings change radically with even small changes in c-hat, then you might just have to accept you don’t have adequate data to do the analysis at all (or, perhaps in a more positive tone, that there are real limits to the inferences you can draw from your data). Unfortunately, your ’time, effort, and expense’ are not reasonable justifications for pushing forward with an analysis if the data aren’t up to it. Remember the basic credo. . .‘garbage in...garbage out’.


Same applies (and is not unrelated to) the issue(s) created by low encounter probabilities. In this case, you have almost no data. So, its no wonder you can't do a median c-hat (or, I suspect, fit any other models of interest either). You might be able to get partway there from here by pooling, but while this influences effective encounter probabilities (typically), it creates different problems (related to the time scale over which inference is made). Sorry, there is no simple solution to really low encounter probability. David Anderson refers to the big law - you should do everything in your power to improve encounter probability.

2/ if p>0.15 (i.e., in the realm of 'acceptable), but you're still having problems, then it is more than likely that your general model (for which you are trying to assess GOF) is overly parametrized *for your data*. Similar to point /1, some will say that if you don't have sufficient data to test GOF for a time-dependent general model, then doing GOF for a less-general model is not useful, since if you're general model is already a reduced parameter model, then the model set is unlikely to contain other even more reduced models that will be of much interest.

The whole issue of 'how much data do you need?', and 'what is the minimum encounter probability to be useful?' has been considered in great depth. Short of reading a fairly copious literature, you're advised to use simulation to estimate 'power' needed for your study (see Appendix A for simulations in MARK).
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University


Return to analysis help

Who is online

Users browsing this forum: No registered users and 1 guest

cron