I have some questions concerning model ranking.
1. How to define the global model. I have first ranked in order of (subjective) importance the factors (time, departure state,etc.) that could affect each biological parameter and after built the models up by adding step by step in that order these factors to the model. The global model defined this way is the more parameterized one whose estimable parameters are not clearly depending on extrinsic redundancy (i.e. as a rough rule of thumb I have let only final mathematical parameters to be inestimable).
Does it make sense?
2.1. Sometimes during model selection process I can't make models converge when I cut off an effect in the more parameterized models, even so, when I cut it off in a later step of model selection (i.e. when model is less parameterized) the model converges without any problem.
For example, considering
{IS(t) T1(sex.state.t) T2(sex.f) E1(age1+age2.t) E2(state.t)}
I can't cut the state effect off by the E2 biological parameter
({IS(t) T1(sex.state.t) T2(sex.f) E1(age1+age2.t) E2(t)})
and make model converge (I have tried also by setting 80 multiple random initial values, continue after 3 cycles and 5000 iterations for cycle).
Even though, later on throughout the model selection process
{IS(t) T1(t) T2(f) E1(age1+age2.t) E2(state.t)}
I can get without any problem the model
{IS(t) T1(t) T2(f) E1(age1+age2.t) E2(t)}
In general my feeling is that removing time effect on IS and E2 makes models very unstable and this led me to keep them in the global model even if they are not high ranked in my own importance/interest ranking. Any suggestion on this?
2.2. Given these kinds of difficulty I have found during model selection, I have decided to start by simplifying firstly E1 (capture rate), after T2 (transitions rate), after T1(survival rate), after IS (State probability of first captured) and finally E2 (State assignment probability).
Any trouble with this way of selecting models?
3. In Burnham and Anderson (2002) (pg. 131) it is said that:

For this reason I have calculated the maximized log-likelihood by knowing the QAIC, the nº of parameters, the sample size (nº of captures minus nº of removals) and the c-hat.
This is a situation I have found relatively often in my analysis, mainly when shaping T2 where T2(sex+state) and T2(state) yield similar results in terms of log-likelihood (e.g. -3371.96574 vs -3373.23043).
Anyway I haven't found any rule of thumb or similar to get an idea on how much similar must be the log-likelihoods to make the larger model not really supported or competitive as stated by Burnham and Anderson.
In a case as the above cited (-3371.96574 vs -3373.23043) should I cut the larger model off by my model list?
That's all (hope to shut off for a while).
Thanks for any help
Simone