www.phidot.org

by **RJB** » Tue Oct 25, 2011 9:35 am

HI Mark users,
I’ve been trying to find some documentation on what to do about colinearity between covariates when modeling in RMark , but I’ve had no luck so far

Problem:
I have colinearity between to continuous covariates (r= 0.69), release date and age at release. I’m reluctant to just remove one as I am interested in their individual effects. The table I have currently is below, however, its incorrect to interpret effects of the individual covariates from models 1 and 3.
1. S(~rel.date) npar 17 QAIC 924.5 Delta 0.00 Weigh0.551 Dev 888.70
2. S(~release.age + rel.date) npar 18 QAIC 926.71 Delta 2.203 Weight0.18 Dev 888.6
3. S(~release.age) npar 17 QAIC 929.3 Delta 4.81 Weight 0.04 Dev 893.5

Freckleton (2002) (Misuse of residuals in multiple regression) suggested that for multiple regressions, the two covariates should be included in the model together for more reliable parameter estimation. I can imagine the same would be true here.

So under this scenario, am I best to just use the additive model 2 knowing that these covariates are correlated? Model 1 clearly produces a “better” model but the extra parameter in model 2 explains more of the deviance, but obviously not enough to be as parsimonious.

Any advice would on how to tease out this issue would be appreciated.

by **birdman** » Tue Oct 25, 2011 10:11 am

Regarding which model to use, that's what AIC is there for. Assuming for a moment that your models are structurally acceptable, you've got Delta QAICs of 0.0, 2.203 and 4.81, and related weights of 0.55, 0.18, and 0.04 suggesting some model selection uncertainty. The top model is 3x "better" than the second model, and substantially better than the third, though a Delta of 4.8 still suggests some support for the third. You could make a strong case for model averaging in this situation. Again, assuming that second model is appropriate.

I can't provide a good answer regarding whether or not it is appropriate to include the two correlated variables in the same model. Maybe there's an appropriate way to compress them into a single variable that utilizes the information in both appropriately?

Hope this helps. I'm sure someone more knowledgeable will pipe in on the covariate issue.

by **jlaake** » Tue Oct 25, 2011 10:24 am

Clearly release date is a better predictor than release age and the second model is in no way competitive. It is an example where adding the covariate made almost no improvement in the model fit and it only appears competitive because it only added one more parameter. I could add a single covariate to your top model that was in no way related (eg Dow-Jones average on the release date) and it would have an AIC no more than 2 units away (if c=1). This type of comparison is discussed in B&A Model Selection book. --jeff

by **birdman** » Tue Nov 01, 2011 12:00 pm

Jeff, thanks for the correction. I've got this underlined, highlighted, and tabbed in my copy of B&A, but it's been awhile since I've thought about this particular issue. Consider my crow eaten...

I would ask a follow-up though, because I've had people give me ambivalent or conflicting answers. Assume a similar situation with more models. Let's say 4 models have deltas less than 2 and enough weight to be apparently competitive. Upon inspection, as in the original post, the second model is identical to the first except for one additional parameter which obviously doesn't improve the model. Given that there remains support for three models that are not different in only one parameter, is it appropriate to drop the second model (as redundant) prior to model averaging, or should it be retained? Obviously, if the variable is highly correlated with another in the model, it shouldn't have been run in the first place, but I'm assuming in this instance that's not the case.

Also, I should note that I understand part of this should be taken care of in the a priori decisions about which model to include, but there remain situations when a model with K parameters and a second with K+1 parameters will be run in the same set, and this situation could arise.

Thanks for any input.

by **jlaake** » Tue Nov 01, 2011 12:23 pm

In my view, the model with the added useless covariate is essentially a redundant model with the model not containing the useless covariate. If you leave it in, it will unfairly weight that model relative to the others. It is a clear choice to drop it if the delta AIC is close to 2 but the problem becomes what is close to 2 -- 1.5? 1.0? Thus, it is hard to specify a rationale for a cutoff. In this case it is an easy decision to drop it because clearly the covariates roughly represent the same quantity. I never even consider models containing both covariates that are different measures of the same quantity. AIC and model averaging should NEVER be viewed as a recipe to avoid rational thinking and logic in decision making. Bottom line is that you need to think about the biological system and what you are trying to represent with your models. Be honest in your analysis and provide a good justification and that is all anyone can and should expect. Sorry for getting on the soapbox, but I think many folks get a little too AIC whacko and stop thinking and it drives me nuts. AIC has become a set of mental handcuffs for many.

regards --jeff

by **RJB** » Thu Nov 03, 2011 11:43 am

Hi Guys,
thanks for the advice. I've since read through B&A.
I have a further query related to a very similar situation. I have an a priori set of models to test the influence of transmitters on survival which is specifically to look at the effect of backpacks. The two top models are below which are nearly identical and have 1 parameter difference, but this parameter is the slope. So model 1is constant survival (S(c)) is only the intercept, whereas model 2 is the intercept & slope. In the cases above, we had 2 covariates, whereas here we are adding 1 covariate (& parameter) to the "null" model.

Model | QAICc |ΔQAICc |QAICc weights |No. parameters |-2LogL
1. S(c) ∙ p(t) ∙ r(att) | 931.90 | 0.00 | 0.39 | 16 | 1131.85
2. S(backpack) ∙ p(t) ∙ r(att) | 932.09 | 0.19 | 0.35 | 17 | 1129.31

Q1- Is the interpretation the same as the upper posts in this situation? i.e. backpack has near 0 effect.

Q2- My specific aim was always to give estimates for the 2 groups (with vs without backpacks) but also show if it is a significant influence on survival (the effect is likely to be small). Am I justified to use model averaged estimates in this case.

As always, any advice would be appreciated.
Regards,
RJB

by **jlaake** » Thu Nov 03, 2011 1:42 pm

Q1- Is the interpretation the same as the upper posts in this situation? i.e. backpack has near 0 effect.

It is the same in that you are comparing 2 models that differ by a single parameter. It is different in that you are using QAIC, delta is no longer 2 and we aren't talking about covariates that are essentially the same. Re-read my last post regarding differences. Bottom line is that you can't really differentiate between these models.

Q2- My specific aim was always to give estimates for the 2 groups (with vs without backpacks) but also show if it is a significant influence on survival (the effect is likely to be small). Am I justified to use model averaged estimates in this case.

Why would you model average? Model 2 provides your best estimate of a backpack effect and its precision and what you intended to provide. Given the closeness of the models, the precision on the backpack estimate will be poor (large se). I would only model average to provide the "best" (in some sense) predicted survival rate.

--jeff

www.phidot.org

Correlated Covariates

Correlated Covariates

Re: Correlated Covariates

Re: Correlated Covariates

Re: Correlated Covariates

Re: Correlated Covariates

Re: Correlated Covariates

Re: Correlated Covariates

Who is online