B.K. Sandercock wrote:4. Fixing the encounter rates to zero for the missing days would work but it seems cumbersome. Say there are ten missing days in each of 15 years, I would have to go through the list of parameters for each of the 15 years to set each of 150 parameters to zero. If we lost day 17 in year 13, I would have to sort out the corresponding parameter number (1097?). I can see this would work, and maybe this is still the best option.
Paul Doherty came up with a neat approach for handling this (which I agree is cumbersome). Take all the parameters that will be fixed to zero, and in the PIM chart, drag them all over to the left, and then move everything else to the right by 1 Since that left-hand parameter in the PIM chart will always be in the same location for every model, and always fixed to 0, then you never have to remember the index number.
5. I asked about the dot notation because it seemed like a potentially easier way to address the problem. If setting up the encounter histories in a relational database, it would potentially be a single step to saturate the days not sampled with periods following the dot notation. If use of the dot notation would affect the estimation process and lead to different answers, then I agree that option 4 is probably best.
Well, whether it 'works' or not depends on how you define 'works'. Here is a quick and dirty demonstration. I simulated a data In the following data set, true generating model is phi(t)p(.) - 7 occasions for phi, with true survival alternating from 0.7 to 0.8 (i.e., phi(1)=0.7, phi(2)=0.8, phi(3)=0.7,phi(4)=0.8...). True p=0.5. 1000 new individuals released per occasions.
Here are the estimates from fitting phi(t)p(.) (i.e., the true generating model) to the data:
- Code: Select all
full data set
Real Function Parameters of {phi(t)p(.)}
95% Confidence Interval
Parameter Estimate Standard Error Lower Upper
-------------------------- -------------- -------------- -------------- --------------
1:Phi 0.6926184 0.0204279 0.6512014 0.7311478
2:Phi 0.7884306 0.0176847 0.7516991 0.8210221
3:Phi 0.7040740 0.0151800 0.6734816 0.7329367
4:Phi 0.8112890 0.0157498 0.7784707 0.8402432
5:Phi 0.7178065 0.0155462 0.6863650 0.7472567
6:Phi 0.8086359 0.0206981 0.7647691 0.8459698
7:p 0.4973416 0.0072387 0.4831597 0.5115278
So, the estimates are pretty close to parameters used in the simulation. Next, we pretend that occasion 3 was missed (for some reason). So, I took the data set, highlited column 3 in the EH, and changed everything to a 0. Note that doing this will make some encounters '0000000', so I simply deleted that row from the data file. Here are the data:
- Code: Select all
1100000 130;
1000000 441;
1101111 1;
1000110 6;
1001010 3;
1001100 8;
1100011 2;
1101100 14;
1100110 5;
1101000 19;
1101011 3;
1100111 5;
1001000 15;
1000000 64;
1001100 7;
1100010 5;
1100000 63;
1000100 9;
1100010 6;
1101000 16;
1001000 20;
1000100 12;
1101001 7;
1100101 2;
1000010 3;
1101110 4;
1001011 4;
1001110 5;
1001001 6;
1100101 6;
1001011 3;
1101010 5;
1001101 1;
1100100 9;
1101110 5;
1100100 9;
1101101 1;
1001110 3;
1000101 2;
1000001 5;
1101111 1;
1101100 8;
1000011 7;
1100011 3;
1100111 3;
1000110 4;
1101001 7;
1001101 4;
1000010 7;
1000111 2;
1000011 3;
1101010 2;
1100110 3;
1001010 1;
1001001 1;
1101101 2;
1100001 1;
1101011 2;
1000111 1;
1100001 1;
1000101 1;
1000001 1;
1001111 1;
0100000 168;
0100000 386;
0100011 10;
0101000 68;
0101000 50;
0101010 16;
0100100 29;
0101100 27;
0101001 9;
0101100 22;
0101001 9;
0101101 8;
0100101 13;
0100111 5;
0101011 5;
0101101 8;
0100100 34;
0100001 6;
0101110 12;
0100010 13;
0100001 9;
0100110 12;
0100010 11;
0100111 9;
0101110 6;
0100101 6;
0100110 9;
0101010 15;
0100011 11;
0101111 4;
0101011 6;
0101111 4;
0001000 152;
0001100 68;
0000101 28;
0000011 27;
0000001 24;
0001101 24;
0000010 35;
0001010 37;
0001110 31;
0000111 14;
0001111 21;
0001011 23;
0000100 63;
0001001 10;
0000110 15;
0001010 99;
0001011 53;
0001000 372;
0001100 213;
0001111 60;
0001110 93;
0001001 57;
0001101 53;
0000111 161;
0000110 194;
0000100 497;
0000101 148;
0000010 604;
0000011 396;
Now, there are 3 ways we can handle analyzing data with a 'missing occasion 3': we (1) delete column 3 from the EH, and set the interval from the new 2 to 3 to 2, (ii) we run as is, with the column 3 of all zeros, and fix p(3)=0, or (iii) we simply edit the file, and put a '.' in place of the zeros in column 3 of the EH. I'll leave it to you to try this on your own, and will simply present the results.
First, deleting the 3rd column of all zeros, and setting the interval to 2.
- Code: Select all
missing 3rd occasion - interval test
Real Function Parameters of {phi(t)p(.) - missing 3rd occasion - explicit interval}
95% Confidence Interval
Parameter Estimate Standard Error Lower Upper
-------------------------- -------------- -------------- -------------- --------------
1:Phi 0.6927870 0.0236915 0.6445124 0.7371792
2:Phi 0.7431168 0.0112142 0.7205296 0.7644751
3:Phi 0.8208921 0.0166706 0.7858638 0.8512748
4:Phi 0.7160609 0.0160604 0.6835602 0.7464616
5:Phi 0.8185136 0.0219621 0.7714532 0.8576715
6:p 0.4884757 0.0088797 0.4710925 0.5058867
Next, keeping the 3rd column of all zeros, and fixing p(3)=0 (not, for this example I created a parameter for the ecounter probability that was contant for all occasions, but with a different index for p(3), which I fixed to 0. In other words...
- Code: Select all
INPUT --- group=1 p rows=6 cols=6 Triang;
INPUT --- 7 8 7 7 7 7;
INPUT --- 8 7 7 7 7;
INPUT --- 7 7 7 7;
INPUT --- 7 7 7;
INPUT --- 7 7;
INPUT --- 7;
Here are the results:
- Code: Select all
no detections at occasion 3 - fixed p=0
Real Function Parameters of {phi(t)p(2 p - one fixed=0)}
95% Confidence Interval
Parameter Estimate Standard Error Lower Upper
-------------------------- -------------- -------------- -------------- --------------
1:Phi 0.6927868 0.0236915 0.6445123 0.7371790
2:Phi 0.7431186 31.977105 0.7437689E-142 1.0000000
3:Phi 0.7431149 31.976942 0.7472886E-142 1.0000000
4:Phi 0.8208920 0.0166706 0.7858638 0.8512748
5:Phi 0.7160610 0.0160604 0.6835603 0.7464617
6:Phi 0.8185135 0.0219621 0.7714531 0.8576714
7:p 0.4884757 0.0088797 0.4710925 0.5058867
8:p 0.0000000 0.0000000 0.0000000 0.0000000 Fixed
Finally, taking the 3rd column of all 0's, and making them all dots. Then, in theory, don't need to fix p=0 for anything - implicit in the dot notation. Here are the results (note - using 7 occasions - meaning, the dot is read as an occasion):
- Code: Select all
dot - 7 occasions including dot
Real Function Parameters of {test}
95% Confidence Interval
Parameter Estimate Standard Error Lower Upper
-------------------------- -------------- -------------- -------------- --------------
1:Phi 0.6927869 0.0236915 0.6445123 0.7371791
2:Phi 0.7431168 0.0000000 0.7431168 0.7431168
3:Phi 0.7431168 0.0000000 0.7431168 0.7431168
4:Phi 0.8208921 0.0166706 0.7858639 0.8512749
5:Phi 0.7160609 0.0160604 0.6835603 0.7464616
6:Phi 0.8185135 0.0219621 0.7714531 0.8576714
7:p 0.4884757 0.0088797 0.4710925 0.5058867
OK, so before comparing results, note that true phi(2)=0.8, true phi(3)=0.7, so the true product survival from occasion 2 -> 4 is 0.8*0.7=0.56. Note that the sqrt(0.56)=0.74833. Meaning, for 'perfect' data', but given missing occasion 3, the best we might think we would expect to get is an estimate of phi(2)=phi(3)=0.74833.
What actually do we get? From above, for the model where we deleted column 3, and set the interval from 2 -> 4 as 2, we got phi-hat(2)=0.74312, which is pretty much bang on. For the case where instead of explicitly coding the interval, we instead left in the column of 0's, and simply fixed p(3)=0, we got phi-hat(2)=0.74312=phi-hat(3)=0.74312, again, spot on, and identical (to within rounding error) of the 'explicit interval approach'. Meaning, for this example, it doesn't much matter which approach you use (interval specification, or fixing p=0 for some parameter). How does the 'dot' notation approach compare? Notice you have 6 estimates, meaning MARK is trying to estimate phi(2), and phi(3), even though sampling occasion 3 is missing. Fine, no different, really, than fixing p(3)=0 = you get 6 estimates for phi, where phi(2)=phi(3)~sqrt(0.56). Finally, for the dot notation, phi-hat(2)=phi-hat(3)=0.7431, which when multiplied together are pretty close to 0.56 (0.552). Both are obviously basically the same as the other approaches.
So, at least for this example, the 'dot notation works' - but it does not change the issues about unequal intervals from anything else you do. I still hold to the basic point that missing data = missing intervals = need to be careful. Estimation can be tricky, and interpretation - especially model selection (e.g., what is a time-invariant model with missing sampling occasions?).
And as for GOF testing with dots, not doable - both mechanically (MARK won't let you), or conceptually (although if you hard-coded 'expectation' for cell frequencies under a dot sampling model, and then did a form of contingency testing against that, I can imagine where you could, in theory, get there from here. But it would be a lot of work.