www.phidot.org

Website · by **simone77** » Wed Oct 19, 2011 6:39 am

Here there is a case of study about the estimability problems in a CJS. I have tried to apply what I have learned by this forum and the Gentle Introduction to MARK. I hope it can be of some interest for others and that someone may teach me something more.

To my knowledge, MARK provides a few tools to handle estimability problems, two MARK functions: (i) data cloning (Appendix F) and (ii) simulation annealing (Ch. 10 ; 10.4.3 sidebar), and more than this, they suggest some good practices to be followed when analyzing CMR data.

Estimability problems can arise in several situations: (a) particular model structure (i.e. confounded terminal parameters in CJS), (b) boundary estimates (near to zero or to 1), (c) multiple optima in the likelihood function and (d) data sparseness.

These "good practices" are summarized in Ch.10 (10.4.3 ; 10-38) where it is said that:
1) In some situations (particularly in Multi-state models), it is recommended to start with a simpler model and use the constant parameter estimate as parameter starting point, this can help when there are numerical estimation problems, in particular, to deal with multiple optima in the likelihood function.
2) In some cases it is efficient to use other link function, for instance whit boundary estimates sin link can be more efficient than logit link, even though sin link is not so good to estimate CI when parameters' values are near to the boundary and, also, you cannot use sin link in several situations (other than the manual there are quite a few posts of Gary White on this like: http://www.phidot.org/forum/viewtopic.php?f=1&t=432&p=985&hilit=+boundary#p985).

My analysis is a CJS with an individual covariate (weight), 9 occasions and two groups (males and females) of rabbits in an enclosure. This is a summary of my encounter history:

My best model seems to be {Phi(weight*t) P(sex*t)} (see http://www.phidot.org/forum/viewtopic.php?f=1&t=1966 for some details). I have some trouble with Phi4 and Phi5 as shown below.

One possibility is that there are multiple optima in the Likelihood and this might be tested by using MCMC when running the model, this is explained in the Ch.10 and a further appendix should be soon available on this. To my shame I haven't understood how to create histogram plots like those seen in the GI (10-42 -> 10-44), anyway I have tried to set different starting point when running the model and it didn't the trick in my case (no changes in the results).

I have tried to use other link functions but it didn't the trick either. Regarding to this, you cannot use every link for every analysis, as stated in another post from Gary White, "...The way to figure out if you screwed up is to run the model with both the logit and then sin links, and if the estimates and deviance differ, you probably should not have been running the sin link...". In this case I believe I couldn't use the sin link as I have an individual covariate and thus there is not a monotonic relationship, am I wrong? (the above rule of thumb tells me I should not use sin link in this case)

I have tried to use data cloning (number of replicates=100) and I get this:

This should indicate that parameters 4 and 5 are not estimable (but terminal parameters seem identifiable, more on this in http://www.phidot.org/forum/viewtopic.php?f=34&t=1482) because of the SE ratio (it should be about =10 in this case).

Given that they are not structurally unidentifiable parameters I have tried to calculate Profile likelihoods and after to data cloning on this as suggested in Appendix F. I got this:

Parameter 4 now has a (very small) CI and data cloning on that has produced an even shorter CI. I am not sure about what this result could indicate but it could suggest this parameter is identifiable and very close to the boundary.

After this, considering that parameter 4 would be solved, there would be parameter 5 still unestimated.
I have tried to use the simulated annealing by checking the box "Use Alt. Opt. Method" and this is the result:

While parameter 4 estimate has not profited from this (it would make sense given that its problem should arise from being too close to the boundary and not from multiple optima in the LI), parameter 5 has been estimated diverse from 1 but its CI is so broad that it makes it useless.

Finally, I have tried to force the model to have a constant P ({Phi(peso*t) P(.)}), in order to see what happens whit these estimates and this is what I got:

Now the estimate for the parameter 5 seems to be realistic (also by having a look at the reduced m-array shown above), parameter 4 is useless due to its huge CI as could be expected if its difficult estimation is due to closure to the boundary.

Anyway, this model has a DeltaAIC more than 30 with respect to the "best" model {Phi(weight*t) P(sex*t)}, this is not the way I have been told to make the analysis (firstly model the recapture and after the survival) and, last but not least, I haven't found this as a documented "good practice" in the forum or in GI.

Any suggestion or commentary would be really appreciated.

Simone

by **abreton** » Thu Oct 20, 2011 12:56 pm

In this case I believe I couldn't use the sin link as I have an individual covariate and thus there is not a monotonic relationship, am I wrong?

No, you're correct.

Any suggestion or commentary would be really appreciated.

I suggest that you build model Phi(t) p(t) in the Design Matrix and run it using the sin link. From this model, translate the real parameter estimates from this model to the logit scale and use these logit estimates as starting values for model Phi(t) p(t) with the logit link. To accomplish the translation, determine which phi is represented by the intercept, we'll call this phi-int. Let's say phi-int = 0.7576. On the logit scale,

logit = log of odds ratio = ln(0.7576/(1-0.7576)) = 1.1395

Now you have a starting value on the logit scale for the intercept. What about the time offsets (the other betas associated with phi) on the logit scale? You could plug 1.1395 into the logit back-transform function (see page 6-11 in the MARK book) and solve for the unknown, repeat this for each phi estimate associated with an offset. For example, say phi2 from your model was 0.8786 then

0.8786=1/1+exp(-1*(1.1395+x)) (note there are many equivalent forms of this logit back-transform function).

x is the offset associated with phi2 on the logit scale. Repeat this process for p and then use the suite of logit estimates as starting values to run model Phi(t) p(t) with the logit link. If this model converges on estimates of phi4 and phi5, then change p to sex+t and use starting values from model Phi(t) p(t) (logit). If phi4 and phi5 are estimated...change p to sex*p, use starting values from Phi(t) p(sex+t)...if this also converges on phi4 and phi5 then add weight to phi and use starting values from Phi(t) p(sex*t).

Note: Don't use Sim Annealing when trying this possible fix, use the default optimization method.

andre

by **abreton** » Thu Oct 20, 2011 1:12 pm

Another option to try, set initial values of the beta offsets associated with phi4 and phi5 to zero. Use starting values for the other (beta) parameters from simpler models.

Note: Don't use Sim Annealing when trying this possible fix, use the default optimization method.

by **Miguel** » Thu Oct 20, 2011 5:46 pm

I'm just chiming in to thank both Simone and Andre for this very insightful discussion. I hope that Simone will let us know how Andre's suggestion worked out.

I'm in the process of reanalyzing some long-term data that is both very valuable in conservation terms and sparse so I may need to try this approach at some point.

Thanks again,
Miguel

Website · by **simone77** » Fri Oct 21, 2011 12:46 pm

I have followed Andre's suggestions and here are the results.

Abreton wrote:I suggest that you build model Phi(t) p(t) in the Design Matrix and run it using the sin link. From this model, translate the real parameter estimates from this model to the logit scale and use these logit estimates as starting values for model Phi(t) p(t) with the logit link.

I am not sure to understand why this process. Without entering into a deep mathematical explanation of this (probably I wouldn't get it) I would like to understand the "conceptual" motivation. Here goes my guess: you can't use the Sin link with individual covariates but at the same time Sin link do better* than Logit when there are multiple optima in the Likelihoods. For these reasons, it would make sense to find the p corresponding to the global maximum and after translate them to Logit scale, set the derived p's as initial parameters values, and run the model with the Logit link.
*It's just a guess! I haven't found it anywhere

Abreton wrote:To accomplish the translation, determine which phi is represented by the intercept, we'll call this phi-int.

This is the DM whose reference cell (phi-int) corresponds to Phi1 (see 6-83 of GI for suggestions about how to code reference cell).

I ran this model by specifying Sin link,

If compared to the same identical model run with the Logit link,

the Sin link gives a very narrow estimate of CI for both Phi4 and Phi5 but its UCI is nonsensical being >1, whereas the Logit link gives a very broad (and still nonsensical) CI for Phi5 and, I don't understand it, a UCI>1 for the Phi4.

Afterwards, I transformed the Sin p to Logit p (by looking at 6-11 of GI I would have supposed that it should have been {0.8786=1/1+exp(-1*(1.1395 + x))} instead of {0.8786=1/1+exp(-1*(1.1395 - x))}), any way I have used yours.

The first thing I noticed is that the Sin link p estimates are in some cases quite different from the Logit link ones: I am afraid that I could have done something wrong but I can't find what. I have tried to substitute the before "+" with the "-" but it gets again very dissimilar p values. Also, the #¡DIV/0! errors are due to the Sin estimates of Phi4 and Phi5 equal to one.

...and then use the suite of logit estimates as starting values to run model Phi(t) p(t) with the logit link

Considering the formulas used to transform Sin p values to Logit ones,
1) logit(p) = ln(sinp/1-sinp)
2) p-hat = 1/1+exp(-1*(Logit(pintercept + x))
if sinp-->1 ; p-hat-->0
For this reason, I have set 0 as starting values for Phi4 and Phi5 and I got:

Abreton wrote:Another option to try, set initial values of the beta offsets associated with phi4 and phi5 to zero. Use starting values for the other (beta) parameters from simpler models.

If I set as initial parameters values, Phi4 and Phi5 equal to zero, I get this

That gives on one side a very close to one estimate of Phi4 with again a narrow CI and an unexpected (to me) UCI>1, on the other side a nonsensical huge CI for the Phi5.

Abreton wrote:If this model converges on estimates of phi4 and phi5, then change p to sex+t and use starting values from model Phi(t) p(t) (logit). If phi4 and phi5 are estimated...change p to sex*p, use starting values from Phi(t) p(sex+t)...if this also converges on phi4 and phi5 then add weight to phi and use starting values from Phi(t) p(sex*t).

Since model didn't converge on estimates of both Phi4 and Phi5, I haven't go further.

I have tried to follow the same procedure on another very similar dataset (I have data from three rabbits enclosures but different sessions' dates that's why I cannot perform a unique analysis) and I got very similar results with another difficult parameter.

Anyway, I suspect I could have done something wrong with the transformation of Sin determined p's to Logit p's.

Again, many thanks for the suggestions.

Simone

by **abreton** » Fri Oct 21, 2011 2:11 pm

Simon --

I should have added that if the sin link produces estimates for phi 4 and phi 5 that are essentially identical to the logit then don't bother with the transformation from sin to logit scale. My hope was that the sin link would have produced phi4 and phi5 estimates that were more reasonable than the logit, appears that the sin link is also having trouble estimating phi4 and phi5.

Try sim annealing with the sin link, model phi(t) p (t)?

I don't think it will make a difference but also consider running phi(t) p (t) using the PIMs. Any improvement in the estimate of phi4 and phi5?

I don't see any weirdness in the data that would affect phi4 and phi5 exclusively (looking at the reduced m-array that you provided) but I'm suspicious of a data problem. Very strange that, given your data which are not sparse relative to a lot of datasets that I've worked with, you cannot estimate these two parameters.

-- andre

Website · by **simone77** » Sat Oct 22, 2011 10:40 am

Andre,

Abreton wrote:I should have added that if the sin link produces estimates for phi 4 and phi 5 that are essentially identical to the logit then don't bother with the transformation from sin to logit scale. My hope was that the sin link would have produced phi4 and phi5 estimates that were more reasonable than the logit, appears that the sin link is also having trouble estimating phi4 and phi5.

No problem, as Latins said, "Melius abundare quam deficere"! (It's better to overdo it than underdo it)

Abreton wrote:Try sim annealing with the sin link, model phi(t) p (t)?

I have tried both the Sin and Logit link with this model simulated annealing but it didn't do the trick.

Abreton wrote:I don't think it will make a difference but also consider running phi(t) p (t) using the PIMs. Any improvement in the estimate of phi4 and phi5?

I have used almost all the link functions on the {phi(t) p(t)PIM} but neither this worked out.

Abreton wrote:I don't see any weirdness in the data that would affect phi4 and phi5 exclusively (looking at the reduced m-array that you provided) but I'm suspicious of a data problem. Very strange that, given your data which are not sparse relative to a lot of datasets that I've worked with, you cannot estimate these two parameters.

These are three rabbits enclosures where some individuals are infected from mixomatosis or RHD.
So I believe that the key of this could be in a recent answer Choquet gave me. I have some trouble (apparently not too serious in terms of C-hat) with the GOF of these models and he pointed out that, given there are both healthy and infected individuals they could provide a further and uncontrolled source of heterogeneity.
For some reasons that I don't figured out yet, this heterogeneity could be making the estimates of some specific survival parameters very difficult.
Unfortunately I have infected-not infected data on just about 40-50 individuals randomly chosen each occasion and this doesn't allow to me to build groups neither to use (at least I don't think so) multi-state models. I can just work with diseases population level prevalence and this is not useful in order to reduce heterogeneity.

As always, any suggestion or commentary is very appreciated.

Simone

by **abreton** » Mon Oct 31, 2011 11:59 am

Have you considered a multi-state model? This model would include, phi, p and psi (transition probabilities). In your case, animals can transition from one disease state to the other and back again. Disease states would be infected, not infected. Fitting your data to a multi-state model might result in estimates of phi4 and phi5 for each state.

If you can easily output different sets of encounter histories from your data, some other thoughts: (1) fit the data from each enclosure separately to the phi(t) p (t) model in MARK. Do yo get estimates of phi4 and phi5 in all cases? If not, look at the reduced m-array(s) carefully. (2) Consider creating two input files, infected and not-infected animals. Fit these in mark under CJS again to model phi(t) p (t). Estimates for one group but not the other? Perhaps this will provide insights as well?

Ultimately though, I think the multi-state model makes the most biological sense, much better than CJS for your data. Nonetheless, fitting alternative datasets like I describe and fitting the phi(t) p(t) model might help you locate the source of the estimation problem with phi4 an phi5.

Been playing catch-up at work lately, sorry for the much delayed follow-up response.

andre

Website · by **simone77** » Tue Nov 01, 2011 8:38 am

Yes, I am now working on multi-state models to analyze my dataset as discussed here.

Abreton wrote:(1) fit the data from each enclosure separately to the phi(t) p (t) model in MARK. Do yo get estimates of phi4 and phi5 in all cases? If not, look at the reduced m-array(s) carefully.

Indeed I have been looking at the m-array and I believe that a part of these estimability problems are due to a somehow unbalanced design, i.e. the number of blood sampled captured individuals varied among sessions as well their ratio males/females (I have one session where there were not females blood sampled for example). I believe I must dwell more on this.

Abreton wrote:(2) Consider creating two input files, infected and not-infected animals. Fit these in mark under CJS again to model phi(t) p (t). Estimates for one group but not the other? Perhaps this will provide insights as well?

I don't believe I can do it because there are transitions between states and also because it would reduce too much the dataset. In fact it would mean to further reduce the dataset I am using in the multi-state models (here for details).

Thanks for answering,

Simone

by **abreton** » Tue Nov 01, 2011 3:55 pm

Just a note, my suggestion to build "two input files" was just for trying to find the source of the issue with phi4 and phi5, I did not intend this to be a 'final analysis' by any means. In the long run, I think this problem will go away when you start fitting multi-state models. From this class of models, if i can call them a 'class', those suggested by Conn and Cooch (2009) may be the most appropriate given your data.

andre

www.phidot.org

Parameters estimability

Parameters estimability

Re: Parameters estimability

Re: Parameters estimability

Re: Parameters estimability

Re: Parameters estimability

Re: Parameters estimability

Re: Parameters estimability

Re: Parameters estimability

Re: Parameters estimability

Re: Parameters estimability

Who is online