AIC is unacceptable

Forum for discussion of general questions related to study design and/or analysis of existing data - software neutral.

Postby geoffwah » Tue Oct 20, 2009 11:36 pm

Hi Folks

Thought I'd throw my hat into the ring here.

I quite like the use of the 'cumulative weight' (w+) as an index of the relative importance of predictor variables within an AIC framework (detailed on p168 of Burnham and Anderson 2002). Makes for a neat summation of a standard AIC table, or alternative to one, when the primary interest is on support for the predictor variables rather than the models per se. See Mazerolle et al. 2005, Ecologial Applications 15: 824-834, and Moore and Swihart (2005) J. of Wild. Man. 69, 933-949 for a couple fo examples.

Cheers

Geoff
geoffwah
 
Posts: 28
Joined: Tue Aug 28, 2007 8:47 pm
Location: Melbourne, Australia

Postby geoffwah » Tue Oct 20, 2009 11:52 pm

Forgot to mention that the format I'm particularly keen on in this regard is to have one column showing the model-averaged beta estimates for each variable (with their 95%CIs), adajcent to which sits a column with the cumulative weights. As such, 'effect-sizes' are also portrayed....
geoffwah
 
Posts: 28
Joined: Tue Aug 28, 2007 8:47 pm
Location: Melbourne, Australia

AIC is unacceptable

Postby jlaufenb » Wed Oct 21, 2009 1:31 am

After looking up Geoff's reference in B&A, I noticed that I had highlighted a segment of text on pg 169 for further consideration: "When assessing the relative importance of variables using sums of the w(i), it is important to achieve a balance in the number of models that contain each variable j."

I will fully admit that I am a graduate student currently learning the intricacies of AIC based model selection & multimodel inference and not a statistician by trade. Ironically, Geoff's post raised a question for which I welcome advice from those well versed in the field of model development and multimodel inference.

My understanding is that a great deal of effort should be spent on developing an a priori set of models that reflect plausibe probability distributions of population parameters from which sample data arise. Those models should be (unless the study is an exploratory pilot study) based on the biology of the species being studied. In the example on page 168 of B&E, there are 3 explanatory variables and 8 models representing all combinations of variables, which meet the "balance in the number of models that contain each variable j" requirement for "w+"-based inference. However, I would expect that judicious model development based on a species' biology would result in a set of models where each variable likely is not equally represented.

So, the question is: If one is intersted in the relative importance of predictor variables, how should he/she proceed with model development. I would think that attempting to put all variables on equal "footing" would result in model sets becoming so large that spurious effects are supported by the data and relative variable importance would be biased.

Thanks
Jared
jlaufenb
 
Posts: 49
Joined: Tue Aug 05, 2008 2:12 pm
Location: Anchorage, AK

Postby cooch » Wed Oct 21, 2009 7:47 am

geoffwah wrote:Hi Folks

Thought I'd throw my hat into the ring here.

I quite like the use of the 'cumulative weight' (w+) as an index of the relative importance of predictor variables within an AIC framework (detailed on p168 of Burnham and Anderson 2002). Makes for a neat summation of a standard AIC table, or alternative to one, when the primary interest is on support for the predictor variables rather than the models per se. See Mazerolle et al. 2005, Ecologial Applications 15: 824-834, and Moore and Swihart (2005) J. of Wild. Man. 69, 933-949 for a couple fo examples.

Cheers

Geoff



Except that cumulative weights aren't particularly robust, at least in some cases. While it is convenient, and perhaps informative in some cases, there are issues related to how influenced such weights might be if the model set isn't symmetric with respect to all the variables of interest. Again, go read the following, which is the most recent treatment of the subject.

Murray & Conner 2009: ‘Methods to quantify variable importance: implications for the analysis of noisy ecological data. Ecology 90:348-355


And, again, this somewhat misses the point Jeff and I (and, in fact B&A) make - focus as much as possible should be on effect size. If you're not doing an experiment targeting a factor of interest, looking for the effects of one variable or another based on a posteriori model fitting is almost invariably post-hoc to some degree. The one approach which might be less prone to these sorts of issues is a priori specification of biologically relevant effect sizes, then using model averaged estimates of the effect for a given effect relative to that specified 'relevant' level.

So, one approach would be to calculate (model averaged) effect sizes for the various factors (which would include sign and magnitude) and tabulate those. I acknowledge that this isn't always trivial, but it is doable.
cooch
 
Posts: 1652
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: AIC is unacceptable

Postby cnagy » Wed Oct 21, 2009 11:12 am

jlaufenb wrote:After looking up Geoff's reference in B&A, I noticed that I had highlighted a segment of text on pg 169 for further consideration: "When assessing the relative importance of variables using sums of the w(i), it is important to achieve a balance in the number of models that contain each variable j."

...

So, the question is: If one is interested in the relative importance of predictor variables, how should he/she proceed with model development. I would think that attempting to put all variables on equal "footing" would result in model sets becoming so large that spurious effects are supported by the data and relative variable importance would be biased.


This came up in a study I wrapped up recently...we were particularly concerned with the effect of one variable, and had a few others that we thought would be important. The first variable was in 8 of our 11 models, so comparing cumulative weights between it and other variables would have been...unfair.

I think you should err on the side of having a smart/relevant model set. Throwing in more models just to keep things even seems contrary to the whole IC idea...so, no I wouldn't add new models just to make everything even.

I wonder though if it would be appropriate to run a subset of models with just 1 main effect each. Likely they will all be towards the bottom of the entire model list, but their respective AICs might be compared to the other 1-parameter models to get a measure of importance of individual variables alone.

'nother grad student here, so take all i say with a pound of salt.
cnagy
 
Posts: 8
Joined: Tue Nov 06, 2007 1:06 pm
Location: NYC

Re: right on the mark?

Postby dhewitt » Thu Oct 22, 2009 2:23 pm

bacollier wrote:I guess that the way I look at it is that the table of AIC-type information are fairly useless from a interpretive standpoint, is the parameter estimates that are associated with the posited variables from the set of models. The really important part is what do the model parameter estimates look like for the various models, what are the model averaged estimates, SE, CI, etc as several others have suggested.

I guess I would argue that AIC tables, that just show models should be regulated to appendices or otherwise not used within the manuscripts as they really don't provide a how lot of info other than a list of models and some model relationship statistics, and I think more focus should be put on the parameter estimates from those tables.


As an "ass" editor too, I really disagree with this view, particularly in the review stage. I think the bits of information requested in Anderson et al. (2001) are all essential and must be reviewed prior to anything done with the estimates themselves. I recognize of course that the estimates are what we all care about, but they are very much dependent on the stuff in that model selection table! Especially as folks move to a more in-depth understanding of linear models (not just what type of ANOVA to run) and begin placing inference in a model-selection framework, these elements are critical for review.

Did they count the number of parameters right (I've seen people consider a single, dummy-variable factor with >10 levels as 1 variable!)?

Did the -2logL go down with more parameters, indicating no funky estimation problems?

How much overall model selection uncertainty is there (perhaps indicating that the authors are trying to do too much with too little data)?

Does the model set make sense and are there other "smart" models that could/should be included?

This is just a sampling. To simply assume that the authors have done everything right and all is wonderful with the models themselves is a bad idea. And I see no reason not to include this material in the article even after review, but I can understand the tables being stuck in Appendices, sometimes.
dhewitt
 
Posts: 150
Joined: Tue Nov 06, 2007 12:35 pm
Location: Fairhope, AL 36532

Murray and Conner paper

Postby dhewitt » Tue Nov 03, 2009 1:08 pm

cooch wrote:Except that cumulative weights aren't particularly robust, at least in some cases. While it is convenient, and perhaps informative in some cases, there are issues related to how influenced such weights might be if the model set isn't symmetric with respect to all the variables of interest. Again, go read the following, which is the most recent treatment of the subject.

Murray & Conner 2009: ‘Methods to quantify variable importance: implications for the analysis of noisy ecological data. Ecology 90:348-355


I agree that cumulative Akaike weights can be touchy wrt what Evan calls symmetry in the model set for the variables of interest. B&A 2002 are clear on this issue and note the problems. The solutions they provide are tentative and should always play second fiddle to a well-developed model set that makes explicit comparisons among variables of interest.

All that said, for model sets that involve linear models rather than capture-recapture or occupancy models, the importance of variables issue is big (although probably off-topic for this forum). As a result, I read the M&C paper with interest. After two readings I cannot understand how it adds to the discussion. It seems poor in a number of respects (simulations use mean of 500 with st dev of 1) and is a terrible assessment of cumulative Akaike weights. All due respect to the authors, of course. Perhaps if the models under consideration are all simple linear models (maybe multiple linear regressions, but not even sure about that), the methods they discuss could be helpful. They are based on correlations. For capture-recapture and occupancy models, I don't think any of their conclusions are helpful. To be fair, the authors never said the methods applied to these models.
dhewitt
 
Posts: 150
Joined: Tue Nov 06, 2007 12:35 pm
Location: Fairhope, AL 36532

Re: Murray and Conner paper

Postby cooch » Tue Nov 03, 2009 1:36 pm

dhewitt wrote:
cooch wrote: To be fair, the authors never said the methods applied to these models.


And, to be fair, B&A never consider the cumulative AIC weights beyond simple multiple regressions either.

The larger point here is that a lot more work remains in how to robustly handle importance of a 'factor'. While there is general agreement that effect size is the way to go for assessing an individual factor, it is less clear what you do to talk about 'relative' importance of a factor.
cooch
 
Posts: 1652
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Postby darryl » Tue Nov 03, 2009 3:34 pm

Personally I don't have a problem in using AIC weights with an unbalanced model set (though I do try to avoid it whenever possible) as final inferences are always conditional upon everything that is done up to that point (choice of models fit to the data, selected method of analysis, data entry, sampling etc); a problem with any of those things could invalidate your conclusions.

In terms of interpretation, to account for the unbalanced nature of the model set, instead of stopping at the (summed) model weights or evidence ratios, you could make an adjustment based upon the frequency which a factor appears in the model set. One suggestion is if w=(summed) AIC weight, and F=frequency (eg effect in 3 of 8 models = 0.375), then use [w/(1-w)]/[f/(1-f)] as your measure of support for the importance of that factor. Values >>1 indicate support for the effect, near 1 inconclusive or ambiguity, <<1 support against the effect being important. The range of values are asymmetric and range from 0-inf, so you may actually be better to do this on the log-scale. This type of approach may be worth thinking about whenever each factor is not in 50% of your models, regardless of balance.
Cheers
Darryl
darryl
 
Posts: 498
Joined: Thu Jun 12, 2003 3:04 pm
Location: Dunedin, New Zealand

Re: AIC is unacceptable

Postby jCeradini » Fri Jun 05, 2015 11:34 am

Very recent comment/elaboration from Burnham on summing model weights and relative variable importance. He does couch this approach within the context of all subsets model selection, which is interesting (and also how it is presented in B&A 2002).

http://warnercnr.colostate.edu/~kenb/pd ... urnham.pdf
(posted on his website: http://warnercnr.colostate.edu/~kenb/)

Joe
jCeradini
 
Posts: 72
Joined: Mon Oct 13, 2014 3:53 pm

PreviousNext

Return to analysis & design questions

Who is online

Users browsing this forum: No registered users and 1 guest

cron