c-hat for encounter histories with missing data (.)

questions concerning analysis/theory using program MARK

c-hat for encounter histories with missing data (.)

Postby thaljoha » Sun Mar 13, 2016 7:07 pm

I am performing a multi-state model in MARK of which I have 18 years of round-up data from bison. Each year, if the individual was on the island, they can be one of three states (calf, pregnant, or not pregnant). Sometimes, however, there is missing data so I have a dot in that place. Also, years before and after the individual was in the herd, I have zeros as place holders. So, my file without covariates looks like this:

/* AED8979*/ 0000000000N0000000 1;
/* AED8990*/ 00C000000000000000 1;
/* AED8998*/ 0000N0000000000000 1;
/* AEF7403*/ 000000N00000000000 1;
/* AEF7410*/ 00000000N000000000 1;
/* AEF7420*/ 00000NN00000000000 1;
/* AEF7421*/ 00000C000000000000 1;
/* AEF7455*/ 00000P000000000000 1;
/* AEF7459*/ 00000N..P000000000 1;
/* AEF7483*/ 000000000N00000000 1;
/* AEF7493*/ 00000000000P000000 1;
/* AEF7500*/ 000000N00000000000 1;
/* AEG6907*/ 0000000000N0000000 1;
/* AEG6910*/ 0000CNPP.PPPPPPN00 1;
/* AEG6911*/ 00000000N000000000 1;
/* AEG6923*/ 0000C0000000000000 1;
/* AEG6925*/ 00000N000000000000 1;
/* AEG6952*/ 0000C.NPNPNNP00000 1;


I want to estimate c-hat on my basic model s(.)p(.)psi(g) Apparently, however, I am not able to estimate c-hat using bootstrapping or median c-hat because of my missing data points (the dots), and I have tried to use U-CARE, however it will not run on my computer, it just crashes as soon as I open the .INP file. Any help would be greatly appreciated. Also, do I need to run estimate c-hat when I have the individual covariates in the model?

Thanks!
thaljoha
 
Posts: 4
Joined: Fri Feb 26, 2016 3:38 pm

Re: c-hat for encounter histories with missing data (.)

Postby cooch » Sun Mar 13, 2016 7:48 pm

thaljoha wrote:I am performing a multi-state model in MARK of which I have 18 years of round-up data from bison. Each year, if the individual was on the island, they can be one of three states (calf, pregnant, or not pregnant). Sometimes, however, there is missing data so I have a dot in that place. Also, years before and after the individual was in the herd, I have zeros as place holders. So, my file without covariates looks like this:

/* AED8979*/ 0000000000N0000000 1;
/* AED8990*/ 00C000000000000000 1;
/* AED8998*/ 0000N0000000000000 1;
/* AEF7403*/ 000000N00000000000 1;
/* AEF7410*/ 00000000N000000000 1;
/* AEF7420*/ 00000NN00000000000 1;
/* AEF7421*/ 00000C000000000000 1;
/* AEF7455*/ 00000P000000000000 1;
/* AEF7459*/ 00000N..P000000000 1;
/* AEF7483*/ 000000000N00000000 1;
/* AEF7493*/ 00000000000P000000 1;
/* AEF7500*/ 000000N00000000000 1;
/* AEG6907*/ 0000000000N0000000 1;
/* AEG6910*/ 0000CNPP.PPPPPPN00 1;
/* AEG6911*/ 00000000N000000000 1;
/* AEG6923*/ 0000C0000000000000 1;
/* AEG6925*/ 00000N000000000000 1;
/* AEG6952*/ 0000C.NPNPNNP00000 1;


I want to estimate c-hat on my basic model s(.)p(.)psi(g) Apparently, however, I am not able to estimate c-hat using bootstrapping or median c-hat because of my missing data points (the dots), and I have tried to use U-CARE, however it will not run on my computer, it just crashes as soon as I open the .INP file. Any help would be greatly appreciated. Also, do I need to run estimate c-hat when I have the individual covariates in the model?

Thanks!


First, I'm puzzled how you have missing data, since you have per-individual histories? - the whole notion of a 'dot' makes sense in (say) occupancy, where you don't visit a site on a particular occasion (the 'dot' notation in the history was initially implemented for occupancy models). Is it really missing data, or is it simply that on a particular occasion, when the particular animal is encountered, that you don't know the state of the animal? If the latter, then you need another approach entirely (handling uncertain states in a MS context is a 'growth industry' in this field -- you'll probably end up using a HMM, or twisting the RD with hidden states data type into working for you). Failing that, if you really have missing values, so long as you don't have a lot of them (say, <5% of your data), then a quick-and-dirty approach is to (i) drop those individuals with 'dots' from the file, (ii) fit your general model, and (iii) run the GOF on that. I've found that if there are relatively few 'problem histories' that this works well enough to pass the sniff test.

Second, U-CARE requires numeric states (or used to, I presume it still does), not 'letters'. So, try re-coding your input file -- instead of N,C, P, say, 1,2,. Global search and replace takes care of this. This is actually noted on p. 47 of Chapter 10:

.... The only thing you need to do is make sure that all of your ‘state coding’ is numeric (e.g., if you use ‘B’ and ‘N’ in your input file, for example, you’ll need to change them to numbers, say 1 for ‘N’, and 2 for ‘B’...U-CARE cannot currently handle letters for state coding in the input file).


However, having said that, I rather doubt U-CARE can handle 'dots' either.

And, as for individual covariates, as noted in Chapter 11,

...the recommended approach is to perform GOF testing on the most general model that does not include the individual covariates, and use the c-hat value for this general model on all of the other models, even those including individual covariates. If individual covariates will serve to reduce (or at least explain) some of the variation, then this would imply that the c-hat from the general model without the covariates is likely to be too high, and thus, the analysis using this c-hat will be ’somewhat
conservative’. So, keep this in mind...


To get there from here, you need to make a copy of your input file without the covariates, fit your general model to it, and then look at GOF for this model. Take the c-hat from that analysis, and use it for your analysis of the 'full data' (i.e., the .inp file containing the individual covariates).
cooch
 
Posts: 1652
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: c-hat for encounter histories with missing data (.)

Postby thaljoha » Mon Mar 14, 2016 3:57 pm

Thank you so much for your help. Unfortunately, however, there are numerous individuals with an encounter history that contains a dot, and many of those are the animals with the longest history (so I really don't want to delete those). There is a dot in the encounter history when, during round-up, the vet was not able to obtain a pregnancy status for the animal (for unknown reasons). Is there anyway that I can still run a model with "missing data" (aka dots)?

What if I put in a U for unknown? How will this change my analyses if I have to add another state to the model (one which really is of no interest?)
thaljoha
 
Posts: 4
Joined: Fri Feb 26, 2016 3:38 pm

Re: c-hat for encounter histories with missing data (.)

Postby cooch » Mon Mar 14, 2016 5:26 pm

thaljoha wrote:Thank you so much for your help. Unfortunately, however, there are numerous individuals with an encounter history that contains a dot, and many of those are the animals with the longest history (so I really don't want to delete those). There is a dot in the encounter history when, during round-up, the vet was not able to obtain a pregnancy status for the animal (for unknown reasons). Is there anyway that I can still run a model with "missing data" (aka dots)?


In other words, as I suggested above -- you have encounters where the true state isn't observed. This is not uncommon...

What if I put in a U for unknown? How will this change my analyses if I have to add another state to the model (one which really is of no interest?)


State uncertainty is all the rage these days. Rather than get the state right in the field, we simply assume statistical tools will solve the problem for us. ;-)

Seriously, handling uncertain states is definitely doable -- as a starting point, have a look at

Conn, P.B. & E.G. Cooch. (2009) Multistate capture-recapture analysis under imperfect state observation: an application to disease models. Journal of Applied Ecology, 46, 486-492.


The approach taken in that paper doesn't use MARK (rather, E-SURGE), but more important, it lays out the problem.

Jeff Laake has shown how you can get there from here - using an R package he and colleagues have developed called 'marked' (to distinguish it from the R package 'unmarked', which relates to analysis of unreplicated point counts, among other things -- including occupancy). See

Laake, J. L. 2013. Capture-recapture analysis with hidden Markov models.
AFSC Processed Rep. 2013-04, 34 p. Alaska Fish. Sci. Cent., NOAA, Natl. Mar.
Fish. Serv., 7600 Sand Point Way NE, Seattle WA 98115.


Finally, there are several robust design models in MARK that explicitly handle types of state uncertainty (courtesy of Bill Kendall). While your data aren't RD, you can often trick MARK into thinking you have RD data by padding each capture/encounter occasion with some '0's to create fake secondary samples. I don't know how well this would work for you, but it is perhaps worth a shot, if you want to stay 'in MARK'.

In summary, don't simply turn the dots into a new state -'U'. A student of mine went that route some years back, and the Conn & Cooch paper cited above explains why this is inefficient (i.e., there is a better way).
cooch
 
Posts: 1652
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University


Return to analysis help

Who is online

Users browsing this forum: Bing [Bot] and 0 guests

cron