Page 1 of 1

RMark coding for groups in nest survival model

PostPosted: Mon Nov 18, 2013 2:42 pm
by B.K. Sandercock
Marklist:

This comment is based on the nest survival example on page C-89 of Appendix C of the Mark manual. The input file for the mallard example has the different habitat groups coded as four individual covariates for native, planted, wetland and roadside. The example dataset also has a few nests not assigned to any of these four strata. The input file could be recoded with a single variable with a categorical response variable. I drop the unknown and roadside nests to make comparisons to the example in Appendix C. A note of caution for RMark users - it looks like the way that groups are handled can have an effect on the parameter count for the models. R script as follows.

Code: Select all
library(RMark)
data(mallard)

# Treat dummy variables for habitat types as factors
mallard$Native=factor(mallard$Native)
mallard$Planted=factor(mallard$Planted)
mallard$Wetland=factor(mallard$Wetland)
mallard$Roadside=factor(mallard$Roadside)

# coding habitat as a single variable
mallard$Habitat <- "Unknown"
mallard$Habitat[mallard$Native==1] <- "Native"
mallard$Habitat[mallard$Planted==1] <- "Planted"
mallard$Habitat[mallard$Wetland==1] <- "Wetland"
mallard$Habitat[mallard$Roadside==1] <- "Roadside"
# drop nests in unknown and roadside habitats
mallard <- subset(mallard, mallard$Habitat!="Unknown" & mallard$Habitat!="Roadside")
mallard$Habitat=factor(mallard$Habitat)

# Examine a summary of the groups
summary(mallard$Habitat)

# Write a function for evaluating a set of competing models
run.mallard=function()
{

# 2. DSR varies by habitat type treats habitats as factors
# and the output provides S hats for each habitat type
Hab=mark(mallard,nocc=90,model="Nest",
model.parameters=list(S=list(formula=~Native+Planted+Wetland)),
groups=c("Native","Planted","Wetland"))

# 2A. Alternative coding for habitat model
Hab2=mark(mallard,nocc=90,model="Nest",
model.parameters=list(S=list(formula=~Habitat)), groups="Habitat")

return(collect.models() )
}

mallard.results=run.mallard()
mallard.results


The [edited] parameter estimates and model selection from running this script then look like:
Code: Select all

Output summary for Nest model
Name : S(~Native + Planted + Wetland)

Npar :  4  (unadjusted=3)
-2lnL:  1445.949
AICc :  1453.956  (unadjusted=1451.953)

Beta
               estimate se       lcl       ucl
S:(Intercept) 2.2750844  0 2.2750844 2.2750844
S:Native1     0.5878466  0 0.5878466 0.5878466
S:Planted1    0.8105261  0 0.8105261 0.8105261
S:Wetland1    0.6807600  0 0.6807600 0.6807600

Real Parameter S
                                        1
Group:Native1.Planted0.Wetland0 0.9459833
Group:Native0.Planted1.Wetland0 0.9562953
Group:Native0.Planted0.Wetland1 0.9505390

Output summary for Nest model
Name : S(~Habitat)

Npar :  3
-2lnL:  1445.949
AICc :  1451.953

Beta
                  estimate        se        lcl       ucl
S:(Intercept)    2.8629310 0.0992682  2.6683653 3.0574967
S:HabitatPlanted 0.2226796 0.1273492 -0.0269249 0.4722841
S:HabitatWetland 0.0929133 0.2454493 -0.3881674 0.5739940

Real Parameter S
                             1
Group:HabitatNative  0.9459833
Group:HabitatPlanted 0.9562953
Group:HabitatWetland 0.9505390

> mallard.results
                           model npar     AICc DeltaAICc    weight Deviance
2                    S(~Habitat)    3 1451.953  0.000000 0.7313373 1445.949
1 S(~Native + Planted + Wetland)    4 1453.956  2.002837 0.2686627 1445.949


Both models produce the same parameter estimates for the three habitats and have the same deviance. However, the model based on habitats coded as individual covariates has an adjusted parameter count of four because the model has an intercept plus one parameter for each of three habitat levels. In Table 3 of Rotella et al. 2004, the parameter count for this model is given as K=4. The model for the recoded Habitat variable has a parameter count of three, which then affects the AICc value, delta-AICc and model rankings.

I suggest K=3 should be the correct parameter count for a model with three levels of habitat. If the model is fit with groups as individual covariates, it might be best to use the unadjusted count for K. Posting this information to the forum in case anybody else is working through the examples to learn RMark.

Cheers, Brett.

Re: RMark coding for groups in nest survival model

PostPosted: Mon Nov 18, 2013 6:13 pm
by jlaake
It would be good for Jay Rotella to jump in here when he can. Jay contributed the example and I'm not that familiar with the data. But if you only have 3 levels of habitat then there should only be 2 individual dummy variables because the third acts as an intercept. Could it be that the other "roadside" and "unknown" are being lumped into the intercept? Or it may be that I have the code incorrect in the example in Appendix C.

You are also correct that treating habitat as a group variable is often the better way to go because then MARK will output a real value for each level of the factor so you needn't use individual covariates and predictions. I'm not sure what Jay's motivation was for coding it that way. As with any problem there are often many solutions.

regards --jeff

Re: RMark coding for groups in nest survival model

PostPosted: Mon Nov 18, 2013 6:37 pm
by Rotella
The original example dataset has nests reported in 4 habitat types. When the example was built, I simply used dummy or indicator variables to code the nests. If I were doing it today, I'd use the grouping approach for the reason Jeff mentions in his note.

Regardless of how you enter the information (groups or indicator variables presented as individual covariates), you get the same answers and all is fine with either approach.

In Brett's example, he dropped the Roadside nests and 4 nests in unknown habitat. So, the reduced dataset, only has nests in 3 habitat types. Thus, the model represented as:
Hab=mark(mallard,nocc=90,model="Nest",
model.parameters=list(S=list(formula=~Native+Planted+Wetland,
groups=c("Native","Planted","Wetland"))
is overparameterized as evidenced by the fact that the SEs for the beta-hats are all reported as being 0.

If you reduce the model to have only 3 terms, which is appropriate for the reduced dataset, all is fine.
The reduced model can be achieved in several ways. First, you could drop one of the 3 habitat types from the list, e.g.,
Hab=mark(mallard,nocc=90,model="Nest",
model.parameters=list(S=list(formula=~Planted+Wetland,
groups=c("Native","Planted","Wetland"))

A second approach is to forcibly remove the intercept (see the 3rd line below).
Hab=mark(mallard,nocc=90,model="Nest",
model.parameters=list(S=list(formula=~Native+Planted+Wetland,
remove.intercept=TRUE)),
groups=c("Native","Planted","Wetland"))

Results from either of these approaches matches Brett's 2nd model and all correctly report k=3 parameters as being estimated.

I hope this helps.
Jay

Re: RMark coding for groups in nest survival model

PostPosted: Mon Nov 18, 2013 6:49 pm
by Rotella
It might be worth noting why in Table 3 of Rotella et al. 2004, the parameter count for the habitat model is given as K=4. For the model in question, daily survival rate was modeled as being different in each of 4 habitat types. We modeled the rates using an intercept and 3 indicator variables: 1 for Native, 1for Planted, and 1 for Wetland cover. Thus, the intercept was for Roadside nests (the baseline group) and 4 parameters were in fact estimated.

The log-odds of daily survival for nests in Roadside cover = beta_0
The log-odds of daily survival rate for nests in Native cover = beta_0 + beta_1
The log-odds of daily survival rate for nests in Planted cover = beta_0 + beta_2
The log-odds of daily survival rate for nests in Wetland cover = beta_0 + beta_3

Re: RMark coding for groups in nest survival model

PostPosted: Tue Nov 19, 2013 2:03 pm
by B.K. Sandercock
A somewhat trivial point but there are actually five groups in the mallard dataset because there are four nests not assigned to any of the four habitat strata. This led to some initial confusion for me in working through the example.

If you run the following model without dropping any nests you do get four estimates. However, I believe the baseline group is not Roadside nests, but the combined pool of Roadside and Unknown nests:
Code: Select all
S(~Native + Planted + Wetland)
Real Parameter S
                                        1
Group:Native0.Planted0.Wetland0 0.9558094
Group:Native1.Planted0.Wetland0 0.9459833
Group:Native0.Planted1.Wetland0 0.9562953
Group:Native0.Planted0.Wetland1 0.9505390

If you code the model by habitat and assign the four nests with zero for each covariate group to "Unknown", you get five estimates, and three estimates that match the first model for Native, Planted and Wetland, but not Roadside.
Code: Select all
Name : S(~Habitat)
Real Parameter S
                              1
Group:HabitatNative   0.9459833
Group:HabitatPlanted  0.9562953
Group:HabitatRoadside 0.9540497
Group:HabitatUnknown  0.9647980
Group:HabitatWetland  0.9505390

If you drop the unknown nests and rerun the first model, the baseline group should be Roadside nests alone. The parameter estimate for Roadside nests (i.e., Native0.Planted0.Wetland0) is slightly different from the first model but would not affect any conclusions. For a different dataset with more nests with missing data I suppose this could be an issue.
Code: Select all
Name : S(~Native + Planted + Wetland)
Real Parameter S
                                        1 
Group:Native0.Planted0.Wetland0 0.9540498
Group:Native1.Planted0.Wetland0 0.9459833
Group:Native0.Planted1.Wetland0 0.9562953
Group:Native0.Planted0.Wetland1 0.9505390

Thanks to Jeff and Jay for the helpful feedback and coding tips!

Regards, Brett.

Re: RMark coding for groups in nest survival model

PostPosted: Tue Nov 19, 2013 2:17 pm
by Rotella
Sorry for the confusion regarding the 4 nests that were mistakenly not assigned to a habitat type in the example. They do cause problems if you use groups in some arrangements. I'll send an updated data file to Jeff in which those 4 nests are assigned to Roadside. In the meantime, here's RMark code that might be useful to someone trying to sort this out.

Code: Select all
library(RMark)
data(mallard)

# run the model using indicator variables
# intercept refers to log-odds of survival for nests in Roadside + 4 nests
# without habitat classification
Hab4.inds=mark(mallard,nocc=90,model="Nest",
               model.parameters=list(S=list(formula=~Native+Planted+Wetland)))

# For 4 nests that weren't assigned to any of the 4 habitat types,
# assign them to Roadside so that models that remove the intercept
# will work properly, i.e., each nest needs to be in 1 of the groups
# Step 1. find the rows for nests without habitat assignment
no.hab=which(mallard$Native==0 & mallard$Planted==0 & mallard$Roadside==0 & mallard$Wetland==0)
# rows 245, 412, 413, & 449 need fixing
# Step 2. assign those 4 nests to roadside
mallard$Roadside[no.hab]=1

# Treat indicator variables for habitat types as factors
# so that they can be used to assign rows to groups
mallard$Native=factor(mallard$Native)
mallard$Planted=factor(mallard$Planted)
mallard$Wetland=factor(mallard$Wetland)
mallard$Roadside=factor(mallard$Roadside)

# a version of the model based on 4 groups that's over-parameterized
Hab4.overpar=mark(mallard,nocc=90,model="Nest",
         model.parameters=list(S=list(formula=~Native+Planted+Roadside+Wetland)),
         groups=c("Native","Planted","Roadside","Wetland"))

# a version of the model based on 4 groups that's fine
Hab4.grpsOK=mark(mallard,nocc=90,model="Nest",
          model.parameters=list(S=list(formula=~Native+Planted+Roadside+Wetland,
                                       remove.intercept=TRUE)),
          groups=c("Native","Planted","Roadside","Wetland"))

# another version of the model based on 1 intercept + 3 groups that's fine
Hab4.3grpsOK=mark(mallard,nocc=90,model="Nest",
                 model.parameters=list(S=list(formula=~Native+Planted+Wetland)),
                 groups=c("Native","Planted","Roadside","Wetland"))