www.phidot.org

by **steven** » Tue Dec 21, 2010 1:56 pm

We're using a multi-stratum live-recapture, dead-recovery analysis to investigate age-specific survival and stratum dynamics. We're fitting "non-parametric" models that generate stratum- and age-specific survival rates. These involve an identity design matrix. There are 3 strata in this analysis. We've noticed that there are qualitative differences in the results depending upon whether the link function is sin or logit. In particular, the order of the stratum-specific trends is changed. So, for example, with the sin link, stratum A has higher age-specific estimates than those of stratum B, which, in turn, are higher than those of stratum C. ("Top" to "Bottom", the order is A, B, C.) In contrast, with the logit link, stratum A has lower age-specific estimates than those of stratum B, which, in turn, are lower than those of stratum C. ("Top" to "Bottom", the order is C, B, A.) For the trait involved, there are major qualitative differences in the biological interpretation of these orderings. At least naively, it is worrisome that there is such a substantial difference associated with a data transformation.

The standard MARK advice here is

"The default is the SIN function, because the sin function is most useful with the identity design matrix to provide a constraint that keeps the real parameters in the [0, 1] interval, yet allows the number of parameters to be correctly estimated."

What I would love to get advice on is

1) the basis in the primary literature for the statement above as to why the sin function is preferable. have there been analytical or numerical studies demonstrating this to be the case?
2) suggestions for how to proceed in this particular case. as readers of this will know, one reason to use the logit link is that it allows correct parametric models to be fit (for these data, parametric models fit with the sin link are clearly wrong). However, this may not trump other reasons to use the sin link (especially since I am not a strong believer in parametric models, at least in the demographic context, and we could just drop them).

Any and all thoughts, advice, and references are much appreciated.

many thanks!

by **jlaake** » Tue Dec 21, 2010 2:43 pm

I presume from what you wrote that you used an identity link for both. Is that correct? Multi-strata models are often difficult to optimize. If the only difference between the 2 models is the link function then the deviances should be identical or very close. If they are not then one (or both) did not converge. I suggest that you try simmulated annealing to see if you can get better convergence. Also, it can help to repeat that process and each time using the initial value from the previous run.

--jeff

by **steven** » Tue Dec 21, 2010 2:54 pm

Many thanks for your thoughts. Yes, we used an identity design matrix for the sin analyses and for the logit analyses.

You wrote:
"Multi-strata models are often difficult to optimize. If the only difference between the 2 models is the link function then the deviances should be identical or very close. If they are not then one (or both) did not converge". In fact, there is a difference between the deviances that is clearly bigger than rounding error. by "did not converge", I think you mean that at least one of the optimizations of the likelihood did not converge to the global peak of the likelihood surface? instead, it converged to a local peak. yes?

further thoughts are much appreciated.

by **cooch** » Tue Dec 21, 2010 3:27 pm

steven wrote:
What I would love to get advice on is

1) the basis in the primary literature for the statement above as to why the sin function is preferable. have there been analytical or numerical studies demonstrating this to be the case?

There is a fairly full discussion of link functions at various places in Chapter 6 in the MARK book. See for example the sidebar beginning on p. 23 (which discusses things like 'why is the sin link not used if the DM is not identity?').

2) suggestions for how to proceed in this particular case. as readers of this will know, one reason to use the logit link is that it allows correct parametric models to be fit (for these data, parametric models fit with the sin link are clearly wrong). However, this may not trump other reasons to use the sin link (especially since I am not a strong believer in parametric models, at least in the demographic context, and we could just drop them).

Any and all thoughts, advice, and references are much appreciated.

The whole notion of whether or not a link function is 'wrong' misplaces emphasis. Link functions are transformations (nothing more, nothing less) that have different properties -- some useful, some pathological for a given purpose. The issue with many of the link functions generally considered in MARK is how well they perform given (i) the bounds on the plausible parameter space (e.g., [0,1] bounded for some parameters), and (ii) how near the estimated value is to the bounds of that space. The sin link has better numerical properties for [0,1] bounded parameters than does the logit link, where the shape of the logit transform makes the likelihood effectively flat near the boundaries. This 'flatness' minimizes the information available to come up with a robust, and precise estimate of the parameter. Hence, use of the sin link. However, the sin link has some issues of its own (see sidebar mentioned earlier).

Jeff mentioned simulated annealing. Good advice. MS models are notorious for having the potential for multiple local optima in the likelihood, and the default optimization routine in MARK (and many other applications, including R, SAS, etc) might not (in some cases) do a particularly good job of finding the global solution. A numerically intensive approach which almost always works (n my experience) is simulated annealing. All if this is discussed in considerable detail in chapter 8 - sidebar beginning on p. 33. Also discussed is the use of the trace from MARK's 'MCMC estimation' to identify such local minima from both differences in various moments of the posterior, but (more usefully), the time series of steps in the chain, which often is very good at revealing these local minima (since the chain will periodically get trapped in the vicinity of these minima, which will be pretty obvious when you see it).

My general advice for MS models is to use simulated annealing to determine values, at least for the most general (most parameterized) models in the model set, since these seem to be the most prone to local minima -- if you don't have local minima for the general model, you're unlikely to have them for reduced parameters versions of the general model. Simulated annealing is *not* speedy though (I did an analysis for Gary once on my 'hot rod' machine that took almost 8 days to finish with SA, <2 hours using the default numerical optimzation).

Website · by **dhewitt** » Tue Dec 21, 2010 3:35 pm

I don't have experience with MS models and this problem, but with fairly complex CJS models I have encountered situations where simulated annealing went bonkers, even worse than SIN or LOGIT link functions and default optimization. Sparse data and overparameterized models are always bad news. For what it's worth...

- Dave

by **steven** » Tue Dec 21, 2010 4:01 pm

Many thanks once again to all who have replied.

For what it is worth, this is large data set (>5000 encounter histories; in case you're wondering, these are human data) or at least large in my experience. In addition, we are seeing the discrepancies I described for models that are not overparameterized (at least as judged by getting meaningful estimates from these same models in other projects, even with much smaller data sets). I know that "overparameterized" is always contingent on idiosyncracies of the data......

Evan's comments about the link functions "just" being transformations is well put. The info about the flatness near the boundary for the logit link makes sense. any further thoughts about the ins and outs of sin vs. logit would be much appreciated. References to any more elaborate discussion of this in the literature are most welcome.

I confess to wondering whether most users compare estimates based on different scales. We did this because we could not fit parametric models to these data using the logit scale, and so fit all of our models (including the non-parametric ones) on the sin scale. (of course, the parametric models on the sin scale turned out to be wrong). we then saw the differences between sin and logit for the non-parametric models

this is all quite educational!

many thanks,

by **cooch** » Tue Dec 21, 2010 5:15 pm

dhewitt wrote:I don't have experience with MS models and this problem, but with fairly complex CJS models I have encountered situations where simulated annealing went bonkers, even worse than SIN or LOGIT link functions and default optimization. Sparse data and overparameterized models are always bad news. For what it's worth...

- Dave

I mentioned simulated annealing only as a robust solution for local minima -- it is not an omnibus solution for lousy data (problem often being exacerbated by overparamterized models).

by **Doherty** » Tue Dec 21, 2010 5:31 pm

I am not sure about what all possible transitions are in your problem... but with more than 2 strata (you have three) and transitions possible between all strata, then you might consider using the MLogit (multinomial logit) link. With multiple strata the MLogit is a much better link function in terms of convergence and keeping associated transitions summed to 1. However, setting those MLogit link functions up correctly can be a bit tricky - you need to rely upon the "Parm-Specific" option in the Link Function box and understand what you are doing. Evan describes the process in a sidebar in Chapter 8 (pg 22).

:mrgreen:

by **cooch** » Tue Dec 21, 2010 5:38 pm

Doherty wrote:I am not sure about what all possible transitions are in your problem... but with more than 2 strata (you have three) and transitions possible between all strata, then you might consider using the MLogit (multinomial logit) link. With multiple strata the MLogit is a much better link function in terms of convergence and keeping associated transitions summed to 1. However, setting those MLogit link functions up correctly can be a bit tricky - you need to rely upon the "Parm-Specific" option in the Link Function box and understand what you are doing. Evan describes the process in a sidebar in Chapter 8 (pg 22).

Despite the offensive graphic (!), Paul raises a good point that hadn't been mentioned up until now -- the multinomial link, which is potentially important for higher-dimension MS problems. They also are heavily used in the 'open' robust design models Kendall describes in Chapter 15.

by **jlaake** » Tue Dec 21, 2010 6:15 pm

For the reasons stated above, in RMark I default to use of the Mlogit link for parameters like Psi. RMark works out the Parm-specific link for you. However, that is not without its problems when you have large numbers of groups, strata, or occasions because each PIM value has to be different for the Mlogit link so it can create large DMs because you can't simplify the PIM structure due to the way MARK creates the MLogit real parameters. If you were to jump into RMark it would also help you to iterate over the optimization process while setting the initial values to the final parameter estimates of the final run. Good initial values can be quite important for convergence. One final point is that a large sample size does not guarantee that you don't have sparse data because that depends on the model. I have been working with an individual offlist on a CJS model where he was getting differences between a model run in MARK with the sin link or logit link and one run in RMark with the logit link. That would seem to be impossible but the reason was that in the MARK run he was using an identity matrix DM and with RMark it was a non-identity DM. By using simulated annealing and repeating the optimization I was able to get the RMark run to match the MARK run to roundoff error. Turns out the reason for the problem was that he was specifying a model for which there was no data in a cell. He had 2 groups: juv and adults and the model specified a year*group interaction and there was no juveniles released in one of the years. When you use an identity DM, there is a separate parameter for the cell with the missing data and it does nothing to the optimization and it ends up with large SE. However, when the DM is additive and includes the parameter with no data, then the standard optimization code in MARK can wander. The nice thing about using simulated annealing is that for each run it always reported that the numerical convergence was suspect whereas the standard optimization would act like it converged when in fact it was not at the global optimum. --jeff

www.phidot.org

sin vs. logit link

sin vs. logit link

Re: sin vs. logit link

Re: sin vs. logit link

Re: sin vs. logit link

Re: sin vs. logit link

Re: sin vs. logit link

Re: sin vs. logit link

Re: sin vs. logit link

Re: sin vs. logit link

Re: sin vs. logit link

Who is online