darryl wrote:I agree with Evan that having an a-priori candidate set would be an ideal, and something we should strive for, but I've had situations where you have multiple parameter types, each with a few different realistic structures/potential covariates and so you can quickly end up with 1000's of possible models. How to whittle that down to practically-sized candidate set is ususally easier said than done, in which case I've often resorted to something along similar lines to what Andy (use a 'best' model for your 'other' parameters) or Robert (use the 'general' model for your other paramters) have suggested. And while not ideal, I think it's a reasonable approach provided that; a) we're honest about what we've done when we write things up; and b) we treat the results as exploritory rather than definitive. If anyone out there has some practical suggestions on what to do when you have 100's or 1000's of possible models (even after thinking hard about what's biological relevent/plausible, etc), especially when you have multiple data-types, I'd be interested to hear them.
Of course, if we were able to implement a formal experiment and control/manipulate that factors we thought were important, that would simplify things greatly.
Cheers
Darryl
I accept that on occasion, there are some practical limits - but in my experience, if the plausible candidate model set has 1000's of models, then the analyst probably hasn't done a particularly good job thinking through what truly constitutes plausible (distinguishing between 'biological' and 'statistical' plausibility). Of course I don't mean Darryl here - statisticians like Darryl can always be excused for narrowness of biological insight. (kidding, kidding..). Darryl makes several very good points (no surprise there), but thought I'd add my two cents worth in quick followup.
More often than not, careful consideration of prior results, biological insight (the 'warm and fuzzy' version of an informative prior), will winnow down most model sets. Its also important to consider whether the point of the exercise is to 'get a parameter estimate', or 'test a biological hypothesis'. If the latter, then some selectivity is no doubt in order. A model set that is a large collection of (say) variants on the 'time-dependence' theme doesn't show much deep consideration (in many cases). Time-dependence is, in an of itself, boring - and irrelevant. Its strictly analogous to the null heterogeneity hypothesis in ANOVA - you do an ANOVA, and find group means differ. Whoopee! What is more important is - what do the differences relate to? Similarly, finding that a time-dependent CJS model (for example) is a better model (pick your criterion) than (say) a dot model is not particularly interesting biologically. (and, of course, this must logically be true, since a dot model is conceptually impossible, since no parameter is truly constant - even if your data are lousy enough to give more support to the dot model than the time-dependent model). What is of interest is - what are the underlying drivers for the temporal variation? If you believe that certain extrinsic factor drive variation in some parameter, then your model set contains those models where the parameter of interest is constrained to be a function of the covariate(s) you think important. I would say the majority of the time (in the work I've done), model structures for individual parameters run from dot models -> 2-4 constrained models -> time-dependent models - so, typically, 6-8 model structures per parameter. With two parameters, and (say) 6 model structures, thats (6-8)xn models (where n is the number of basal parameters), which is a lot more manageable than 'thousands'.
I usually tell my students that if your candidate model set has >80 models, you probably need to do more work thinking about what models should be in the model set. I don't dispute this is not always easy, but which is the preferable argument: ' here is my model set, and perhaps I did leave out some important, plausible models' versus ' I tried every model I can think of'. In the former case, the debate becomes one of the decisions - biologically-motivated - about what models to consider. In the case of the latter, you end up 'telling stories' to justify the models you found to be more parsimonious. While I admit that this can be fun, its not particularly comforting. And, as I noted earlier, I can construct data sets where this 'parameter stepwise' approach will quickly lead you to the wrong conclusion.
There is no doubt you occasionally need to balance approaches. What Darryl (and previous) posts describes is not the most eggregious thing you can do (heck, I've done it myself in some of my own work), but one that should be approached with considerable caution, and complete upfront honesty.
There are still some technical issues (as noted in my earlier posts in this thread), but that's another debate.