We are attempting to estimate hazard rates, dependent on covey membership during the observation interval (weekly) for known fates data using the log(-log) link function.
Birds typically move to new coveys when the "original" covey sizes reaches some threshold minimum value (~5 birds/covey). However, some bird movements appear "random" to some degree (although they likely are not, we do not currently have data to support any theories to describe these movements).
We have a data set w/ 14 weekly observation periods and ~14 coveys in which birds could be located throughout the study.
The question surrounds how best to deal with a potentially large set of dummy covariate values for individual encounter histories. It does not appear best to code the data as such:
/*010231 */ 0010001010101010101010101000 1 2 2 3 3 4 ....
with covariates indicating membership in a particular covey at interval "x" (not including the individual encounter history indicator. This is particularly true if you standardize covariates (given the fact that these data are categorical in nature and membership in covey 6 does not imply twice the effect of membership in covey 3).
It would seem best to code these data by creating a large set of covariate values (essentially number of covariates=encounter periods*number of coveys...so in our case 196 assuming 14 weeks of observation and 14 coveys) with 0 coding for non-membership through the interval and 1 coding for membership through the interval. Then, a design matrix such as
intercept (covariates 1-14) time 0 0 0 ...
0 0 0 intercept (covariates 15-28) time...
...
should accomplish the task at hand.
Can mark handle the number of covariates with which we are dealing?
Would the analysis be better conducted using SAS (PROC PHREG)?
Is there a better way to code the data to allow this analysis? (e.g. w/o the somewhat nasty coding of all the covars?)
many thanks in advance!
cheers,
brant