www.phidot.org

by **B.K. Sandercock** » Wed Jun 12, 2013 7:05 pm

I'm curious to know which models in Program Mark currently allow the dot notation, where occasions without sampling can be coded as a period in the encounter history. This format was originally developed for the occupancy models. The help menus in Mark for "Encounter Histories Format" indicate the dot notation is allowed for the Recaptures Only (CJS), Occupancy Estimation, and Density Estimation. From reading through previous posts to the forum, it looks like the dot notation is allowed for the multistate models too. If dot notation is used, then some of the GOF will not work. A search of the Mark manual with the terms "dot notation" did not yield any hits, but maybe there is a better term to describe this format.

Two quick questions. If the dot notation can be used for CJS encounter histories, can it also be used for the various Pradel temporal symmetry models too? And if dot notation is allowed for a given model in Mark, are there any potential problems with running scripts for the same in RMark?

Thanks in advance for any information!

Brett.

by **jlaake** » Wed Jun 12, 2013 9:16 pm

With regards to RMark, I believe I have . as an allowable character for each ch except for count models. So it depends on whether MARK accepts it in the ch. If you find one where "." doesn't work with RMark. Let me know. I haven't tried them all.

--jeff

by **claudiapenaloza** » Thu Jun 13, 2013 11:14 am

Last I tried, dot notation was allowed for MS models (I was using MS Closed Robust Design), but you already knew that... but it was NOT allowed for Barker or Barker RD models.

by **cooch** » Thu Jun 13, 2013 1:11 pm

B.K. Sandercock wrote:I'm curious to know which models in Program Mark currently allow the dot notation, where occasions without sampling can be coded as a period in the encounter history. This format was originally developed for the occupancy models. The help menus in Mark for "Encounter Histories Format" indicate the dot notation is allowed for the Recaptures Only (CJS), Occupancy Estimation, and Density Estimation. From reading through previous posts to the forum, it looks like the dot notation is allowed for the multistate models too. If dot notation is used, then some of the GOF will not work. A search of the Mark manual with the terms "dot notation" did not yield any hits, but maybe there is a better term to describe this format....

Its not in the book, because I'm still not convinced that the notation works the way people think it should, other than for (i) occupancy models, or (ii) closed abundance models (which are largely redundant in assumptions). For such models, unequal intervals between sampling occasions makes no difference to estimation of primary parameters (say, Psi or N).

But for models (like CJS), people use a dot in the EH to indicate 'missing data'. OK, so what would you do before (or, instead of) the dot approach? You have two choices. Either the missing occasion is in the EH, at which point it would be a column of 0's, and you'd fix encounter probability p for that occasion to 0. Alternatively, you don't have a column for that occasion in the EH, and you would make the interval 2. In either case, MARK will estimate parameter values where the product of 'before the missing occasion' and 'after the missing occasion' will be the same. But, if instead you use dot notation for the missing occasion, MARK will chug along just fine, but you can get quite different answers compared to the other two approaches.

Moreover, even if there isn't an issue with the estimation (which is really the only important one, I suppose), I don't really understand why the dot notation is useful in anything other than occupancy models (and perhaps closed abundance, as noted). If you have a 'missing occasion', then presumably it applies to all individuals (a missing occasion on a per individual basis doesn't make particular sense to me, except for known-fate models, perhaps). If it apples to all individuals, then its a column of 0's in the EH, and you simply fix p=0 for that occasion. Or, you don't have a column of 0's, and you simply tweak the interval length. So, what does dot notation save you here? I can imagine some multi-group data sets where the pattern of missing values differs as a function of group (e.g., in one group you miss occasion 2, in another group you miss occasion 4), at which point dot notation might be more convenient, but then, I suspect you have other issues to contend with.

So,, said notation will go into the book once I've sen a clear demonstration as to why it is useful, and that it works.

by **B.K. Sandercock** » Mon Jun 17, 2013 12:22 pm

The analysis I am interested in using the dot notation for stopover duration in songbirds. Call it 15 years of data with 90-day migration periods in the spring and fall each year. Mistnets are run daily but then there are gaps in the sampling when heavy rain or wind precluded opening the nets. I am wrestling with the best way to set up the encounter histories for the dataset.

1. We could set all of the missing days to zero in the encounter histories and just run CJS models with a daily time step. The encounter rates would be biased low and that would reduce the precision of apparent survival but maybe that is okay if missing days are relatively rare?
2. Use of unequal intervals is not an option here because that would apply to all encounter histories in different years. Unfortunately, the missing days vary among the different years. We could condition on year and do the analysis separately for each of 15 years, but we are more interesting in bringing all of the data into one model.
3. We could collapse the time step in the encounter histories from daily to 2 days which would remove some of the structural zeros but then we lose a little resolution for any period where the sampling was daily without lost weather days.
4. Fixing the encounter rates to zero for the missing days would work but it seems cumbersome. Say there are ten missing days in each of 15 years, I would have to go through the list of parameters for each of the 15 years to set each of 150 parameters to zero. If we lost day 17 in year 13, I would have to sort out the corresponding parameter number (1097?). I can see this would work, and maybe this is still the best option.
5. I asked about the dot notation because it seemed like a potentially easier way to address the problem. If setting up the encounter histories in a relational database, it would potentially be a single step to saturate the days not sampled with periods following the dot notation. If use of the dot notation would affect the estimation process and lead to different answers, then I agree that option 4 is probably best.

The dot notation works with the CJS model but does it work with the Pradel models too? Expect it should do since the temporal symmetry models are an extension of the CJS models, just wonder if anyone has tried or if there is a post somewhere I have overlooked. We want to use Pradel for stopover duration to get the estimates of before-capture residency too.

Not sure if the above is a clear demonstration of why the dot notation might be useful, but some background for why I was interested in learning more about this option. Thanks to Evan and Jeff for providing helpful feedback.

Brett.

by **murray.efford** » Mon Jun 17, 2013 6:13 pm

We want to use Pradel for stopover duration to get the estimates of before-capture residency too.

Off-topic, I know, but I thought I'd buried that idea in 2005. Maybe you have a new take on it?
Murray

by **cooch** » Tue Jun 18, 2013 10:01 am

B.K. Sandercock wrote:4. Fixing the encounter rates to zero for the missing days would work but it seems cumbersome. Say there are ten missing days in each of 15 years, I would have to go through the list of parameters for each of the 15 years to set each of 150 parameters to zero. If we lost day 17 in year 13, I would have to sort out the corresponding parameter number (1097?). I can see this would work, and maybe this is still the best option.

Paul Doherty came up with a neat approach for handling this (which I agree is cumbersome). Take all the parameters that will be fixed to zero, and in the PIM chart, drag them all over to the left, and then move everything else to the right by 1 Since that left-hand parameter in the PIM chart will always be in the same location for every model, and always fixed to 0, then you never have to remember the index number.

5. I asked about the dot notation because it seemed like a potentially easier way to address the problem. If setting up the encounter histories in a relational database, it would potentially be a single step to saturate the days not sampled with periods following the dot notation. If use of the dot notation would affect the estimation process and lead to different answers, then I agree that option 4 is probably best.

Well, whether it 'works' or not depends on how you define 'works'. Here is a quick and dirty demonstration. I simulated a data In the following data set, true generating model is phi(t)p(.) - 7 occasions for phi, with true survival alternating from 0.7 to 0.8 (i.e., phi(1)=0.7, phi(2)=0.8, phi(3)=0.7,phi(4)=0.8...). True p=0.5. 1000 new individuals released per occasions.

Here are the estimates from fitting phi(t)p(.) (i.e., the true generating model) to the data:

Code: Select all: full data set Real Function Parameters of {phi(t)p(.)} 95% Confidence Interval Parameter Estimate Standard Error Lower Upper -------------------------- -------------- -------------- -------------- -------------- 1:Phi 0.6926184 0.0204279 0.6512014 0.7311478 2:Phi 0.7884306 0.0176847 0.7516991 0.8210221 3:Phi 0.7040740 0.0151800 0.6734816 0.7329367 4:Phi 0.8112890 0.0157498 0.7784707 0.8402432 5:Phi 0.7178065 0.0155462 0.6863650 0.7472567 6:Phi 0.8086359 0.0206981 0.7647691 0.8459698 7:p 0.4973416 0.0072387 0.4831597 0.5115278

So, the estimates are pretty close to parameters used in the simulation. Next, we pretend that occasion 3 was missed (for some reason). So, I took the data set, highlited column 3 in the EH, and changed everything to a 0. Note that doing this will make some encounters '0000000', so I simply deleted that row from the data file. Here are the data:

Code: Select all: 1100000 130; 1000000 441; 1101111 1; 1000110 6; 1001010 3; 1001100 8; 1100011 2; 1101100 14; 1100110 5; 1101000 19; 1101011 3; 1100111 5; 1001000 15; 1000000 64; 1001100 7; 1100010 5; 1100000 63; 1000100 9; 1100010 6; 1101000 16; 1001000 20; 1000100 12; 1101001 7; 1100101 2; 1000010 3; 1101110 4; 1001011 4; 1001110 5; 1001001 6; 1100101 6; 1001011 3; 1101010 5; 1001101 1; 1100100 9; 1101110 5; 1100100 9; 1101101 1; 1001110 3; 1000101 2; 1000001 5; 1101111 1; 1101100 8; 1000011 7; 1100011 3; 1100111 3; 1000110 4; 1101001 7; 1001101 4; 1000010 7; 1000111 2; 1000011 3; 1101010 2; 1100110 3; 1001010 1; 1001001 1; 1101101 2; 1100001 1; 1101011 2; 1000111 1; 1100001 1; 1000101 1; 1000001 1; 1001111 1; 0100000 168; 0100000 386; 0100011 10; 0101000 68; 0101000 50; 0101010 16; 0100100 29; 0101100 27; 0101001 9; 0101100 22; 0101001 9; 0101101 8; 0100101 13; 0100111 5; 0101011 5; 0101101 8; 0100100 34; 0100001 6; 0101110 12; 0100010 13; 0100001 9; 0100110 12; 0100010 11; 0100111 9; 0101110 6; 0100101 6; 0100110 9; 0101010 15; 0100011 11; 0101111 4; 0101011 6; 0101111 4; 0001000 152; 0001100 68; 0000101 28; 0000011 27; 0000001 24; 0001101 24; 0000010 35; 0001010 37; 0001110 31; 0000111 14; 0001111 21; 0001011 23; 0000100 63; 0001001 10; 0000110 15; 0001010 99; 0001011 53; 0001000 372; 0001100 213; 0001111 60; 0001110 93; 0001001 57; 0001101 53; 0000111 161; 0000110 194; 0000100 497; 0000101 148; 0000010 604; 0000011 396;

Now, there are 3 ways we can handle analyzing data with a 'missing occasion 3': we (1) delete column 3 from the EH, and set the interval from the new 2 to 3 to 2, (ii) we run as is, with the column 3 of all zeros, and fix p(3)=0, or (iii) we simply edit the file, and put a '.' in place of the zeros in column 3 of the EH. I'll leave it to you to try this on your own, and will simply present the results.

First, deleting the 3rd column of all zeros, and setting the interval to 2.

Code: Select all: missing 3rd occasion - interval test Real Function Parameters of {phi(t)p(.) - missing 3rd occasion - explicit interval} 95% Confidence Interval Parameter Estimate Standard Error Lower Upper -------------------------- -------------- -------------- -------------- -------------- 1:Phi 0.6927870 0.0236915 0.6445124 0.7371792 2:Phi 0.7431168 0.0112142 0.7205296 0.7644751 3:Phi 0.8208921 0.0166706 0.7858638 0.8512748 4:Phi 0.7160609 0.0160604 0.6835602 0.7464616 5:Phi 0.8185136 0.0219621 0.7714532 0.8576715 6:p 0.4884757 0.0088797 0.4710925 0.5058867

Next, keeping the 3rd column of all zeros, and fixing p(3)=0 (not, for this example I created a parameter for the ecounter probability that was contant for all occasions, but with a different index for p(3), which I fixed to 0. In other words...

Code: Select all: INPUT --- group=1 p rows=6 cols=6 Triang; INPUT --- 7 8 7 7 7 7; INPUT --- 8 7 7 7 7; INPUT --- 7 7 7 7; INPUT --- 7 7 7; INPUT --- 7 7; INPUT --- 7;

Here are the results:

Code: Select all: no detections at occasion 3 - fixed p=0 Real Function Parameters of {phi(t)p(2 p - one fixed=0)} 95% Confidence Interval Parameter Estimate Standard Error Lower Upper -------------------------- -------------- -------------- -------------- -------------- 1:Phi 0.6927868 0.0236915 0.6445123 0.7371790 2:Phi 0.7431186 31.977105 0.7437689E-142 1.0000000 3:Phi 0.7431149 31.976942 0.7472886E-142 1.0000000 4:Phi 0.8208920 0.0166706 0.7858638 0.8512748 5:Phi 0.7160610 0.0160604 0.6835603 0.7464617 6:Phi 0.8185135 0.0219621 0.7714531 0.8576714 7:p 0.4884757 0.0088797 0.4710925 0.5058867 8:p 0.0000000 0.0000000 0.0000000 0.0000000 Fixed

Finally, taking the 3rd column of all 0's, and making them all dots. Then, in theory, don't need to fix p=0 for anything - implicit in the dot notation. Here are the results (note - using 7 occasions - meaning, the dot is read as an occasion):

Code: Select all: dot - 7 occasions including dot Real Function Parameters of {test} 95% Confidence Interval Parameter Estimate Standard Error Lower Upper -------------------------- -------------- -------------- -------------- -------------- 1:Phi 0.6927869 0.0236915 0.6445123 0.7371791 2:Phi 0.7431168 0.0000000 0.7431168 0.7431168 3:Phi 0.7431168 0.0000000 0.7431168 0.7431168 4:Phi 0.8208921 0.0166706 0.7858639 0.8512749 5:Phi 0.7160609 0.0160604 0.6835603 0.7464616 6:Phi 0.8185135 0.0219621 0.7714531 0.8576714 7:p 0.4884757 0.0088797 0.4710925 0.5058867

OK, so before comparing results, note that true phi(2)=0.8, true phi(3)=0.7, so the true product survival from occasion 2 -> 4 is 0.8*0.7=0.56. Note that the sqrt(0.56)=0.74833. Meaning, for 'perfect' data', but given missing occasion 3, the best we might think we would expect to get is an estimate of phi(2)=phi(3)=0.74833.

What actually do we get? From above, for the model where we deleted column 3, and set the interval from 2 -> 4 as 2, we got phi-hat(2)=0.74312, which is pretty much bang on. For the case where instead of explicitly coding the interval, we instead left in the column of 0's, and simply fixed p(3)=0, we got phi-hat(2)=0.74312=phi-hat(3)=0.74312, again, spot on, and identical (to within rounding error) of the 'explicit interval approach'. Meaning, for this example, it doesn't much matter which approach you use (interval specification, or fixing p=0 for some parameter). How does the 'dot' notation approach compare? Notice you have 6 estimates, meaning MARK is trying to estimate phi(2), and phi(3), even though sampling occasion 3 is missing. Fine, no different, really, than fixing p(3)=0 = you get 6 estimates for phi, where phi(2)=phi(3)~sqrt(0.56). Finally, for the dot notation, phi-hat(2)=phi-hat(3)=0.7431, which when multiplied together are pretty close to 0.56 (0.552). Both are obviously basically the same as the other approaches.

So, at least for this example, the 'dot notation works' - but it does not change the issues about unequal intervals from anything else you do. I still hold to the basic point that missing data = missing intervals = need to be careful. Estimation can be tricky, and interpretation - especially model selection (e.g., what is a time-invariant model with missing sampling occasions?).

And as for GOF testing with dots, not doable - both mechanically (MARK won't let you), or conceptually (although if you hard-coded 'expectation' for cell frequencies under a dot sampling model, and then did a form of contingency testing against that, I can imagine where you could, in theory, get there from here. But it would be a lot of work.

by **cooch** » Tue Jun 18, 2013 3:30 pm

Another issue which comes up is - how in fact do you enter the 'dots' in the .inp file in the first place (no, I don't mean where is the decimal point on the keyboard). For example, take a look at the following piece of an input file - in particular column (occasion) 3, where I've started coding said occasion as 'missing', using a 'dot'.

Code: Select all: 11.11 16; 01.00 430; 01.11 51; 01.00 215; 01.01 44; 01.10 79; 01.01 51; 01.10 70; 01.11 60; 00100 535; 00110 185; 00111 129; 00101 151; 00010 603; 00011 397;

Everything proceeds in an obvious way until you get down to the histories which start at or later than occasion 3. So, should 'everything' in column 3 be a dot (from top to bottom in the file), or only for those histories where the occasion corresponds to a re-encounter occasion? Does MARK differentiate between a dot in a column, and a logical 0 (say, for histories where the marking event occurs after occasion 0?). To some degree, this depends on the 'sampling'. On one hand, you might very well say 'well, if I'm not there, then I'm not new marking, and I'm also not looking for old marks'. On the other hand, depending on how you work in the field, they may very well be separate (I've actually been involved in some studies where in a given year there is always marking effort, but not necessarily re-encounter effort. So, if there is marking effort, a 1 in that column in the input file, but if not re-encounter effort, a dot? Hmmm....

I stumbled into this when fooling around using simulated data - made a difference whether I simulated multiple cohorts, or a single cohort (which is the default in MARK). What the 'dot' notation does seems to vary as a function of the nature of the data, what a dot actually means in the context of your study, and probably some other things.

I'll check with Gary for insights.

by **claudiapenaloza** » Tue Jan 28, 2014 9:56 pm

Evan, you seem to have many reservations about the 'dot' notation... but I would like to point out this particular post YOU made a while back, which essentially REQUIRES using 'dots' in the .inp for missed sampling occasions in state transition models...

viewtopic.php?f=1&t=2171&p=8621#p8621

www.phidot.org

Which Mark models allow dot notation?

Which Mark models allow dot notation?

Re: Which Mark models allow dot notation?

Re: Which Mark models allow dot notation?

Re: Which Mark models allow dot notation?

Re: Which Mark models allow dot notation?

Re: Which Mark models allow dot notation?

Re: Which Mark models allow dot notation?

Re: Which Mark models allow dot notation?

Re: Which Mark models allow dot notation?

Who is online