Coding of site covariates

questions concerning analysis/theory using program PRESENCE

Coding of site covariates

Postby rk1177 » Sat Apr 03, 2010 9:02 pm

Dear readers,
I’m a new PRESENCE user, attempting to run a multi-season occupancy analysis. I would like to model psi, gamma and eps as a function of a site covariate: management unit (5 groups). From what I’ve read in earlier posts, it is possible to code such a covariate either categorically or continuously, depending on whether it is important to preserve an ordered relationship or not.

Is there any advantage to dummy coding a covariate in PRESENCE?
Unit 2 1 0 0 0
Unit 3 0 1 0 0
Unit 4 0 0 1 0
Unit 5 0 0 0 1
Unit 7 0 0 0 0

I’ve compared models with Unit coded categorically versus continuously (see below). Categorical:
Unit
2
3
4
5
7
Continuous:
Unit
1
2
3
4
5
The models have equal support, but their output differs slightly. Both model outputs give an estimate of psi and eps = 0. However, the categorical model output gives gamma = 0 and the continuous model gives gamma = 1. If both methods of coding are valid (assuming I've done it correctly), why does the estimate of gamma differ?

Thanks in advance for your expertise,
R
rk1177
 
Posts: 7
Joined: Thu Apr 01, 2010 9:37 pm

Re: Coding of site covariates

Postby jhines » Mon Apr 05, 2010 10:57 am

The two codings you've listed are both 'continuous'. The only difference is that you've changed the numerical relationship between the units. In the 2nd coding, psi (or gamma or whatever) for unit 2 will be twice as big as psi for unit 1 (on the logit scale), assuming your design matrix for psi did not have an intercept term. In the 1st coding, psi for unit 2 will be 50% (3/2) bigger than psi for unit 1. You probably don't want to use this type of model.

What you want for a categorical relationship would be to add 'dummy' categorical variables. Instead of a single site covariate named 'unit', you need 5 site covariates, named 'unit1','unit2','unit3','unit4','unit5'. Each of these site covariates would be 1 if the the site belonged to that unit, and 0 if not. Then, your design matrix for psi would look like this:

Code: Select all
Occupancy
-       a1     a2     a3     a4     a5
psi1  unit1  unit2  unit3  unit4  unit5


This would allow initial occupancy to be different for each unit, but no numerical relationship between the units.

Assuming there is a reason for the different units to have a linear increasing or decreasing occupancy parameter, you could code the 'continuous' relationship with the unit covariate as:

unit2 sites - unit covariate=0
unit3 sites - unit covariate=1
unit4 sites - unit covariate=2
unit5 sites - unit covariate=3
unit7 sites - unit covariate=4

Then use the following design matrix for psi:

Code: Select all
Occupancy
-     a1   a2
psi1  1  unit


So, logit(psi) = 1*a1 + unit*a2. or

Logit(Psi(unit1)) = a1
Logit(Psi(unit2))=a1+a2
Logit(Psi(unit3))=a1+2*a2
Logit(Psi(unit4))=a1+3*a2
Logit(Psi(unit5))=a1+4*a2
Logit(Psi(unit5))=a1+5*a2

Jim
jhines
 
Posts: 632
Joined: Fri May 16, 2003 9:24 am
Location: Laurel, MD, USA

Re: Coding of site covariates

Postby rk1177 » Wed Apr 07, 2010 8:52 am

Thanks much!
rk1177
 
Posts: 7
Joined: Thu Apr 01, 2010 9:37 pm

Re: Coding of site covariates

Postby rk1177 » Fri Apr 09, 2010 11:06 am

A follow up question: If I chose to dummy code Unit, would it be valid to have PRESENCE estimate an intercept? So the design matrix would look like:

a1 a2 a3 a4 a5 a6
psi 1 Unit1 Unit 2 Unit 3 Unit 4 Unit 5
rk1177
 
Posts: 7
Joined: Thu Apr 01, 2010 9:37 pm

Re: Coding of site covariates

Postby jhines » Fri Apr 09, 2010 11:21 am

If you want to estimate a parameter for 5 units, you need to estimate 5 beta's (a1,a2,a3,a4,a5). If you add an intercept, you'll be estimating 6 beta's.

The way I suggested, psi is estimated independently for each unit. (eg., logit(psi(unit1))=a1, logit(psi(unit2))=a2,...)

Another way to get the same occupancy estimates would be to make one unit a 'reference' unit, and estimate the difference between this unit and the others. To do this, the design matrix would look like this:
Code: Select all
    a1    a2     a3     a4     a5
psi  1  unit2  unit3  unit4  unit5


I made unit1 the 'reference' unit, but could have made any of the others the 'reference' unit. With this design matrix, logit(psi(unit1))=1*a1, logit(psi(unit2))=1*a1+1*a2, logit(psi(unit3))=1*a1+1*a3,...

Notice that I still only estimate 5 beta's (a1-a5). Both design matrices will give the same estimates of psi for each unit.

Jim
jhines
 
Posts: 632
Joined: Fri May 16, 2003 9:24 am
Location: Laurel, MD, USA

Re: Coding of site covariates

Postby rk1177 » Fri Apr 09, 2010 12:36 pm

Thanks for the speedy reply!

I just tried both coding methods that you described, but I didn’t get the same estimates of psi and the models had different AIC values.

My model input for estimating psi for each unit independently (AIC = 43.13, delta AIC = 0):

- a1 a2 a3 a4 a5
psi Unit_2 Unit_3 Unit_4 Unit_5 Unit_7

Betas:
estimate std.error
A1 :occupancy psiUnit_2 -43.782931 (0.000000)
A2 :occupancy psiUnit_3 -49.318625 (0.000000)
A3 :occupancy psiUnit_4 114.036615 (0.000001)
A4 :occupancy psiUnit_5 -35.796095 (-1.#IND00)
A5 :occupancy psiUnit_7 -142.120110 (0.000001)


My model input for dummy coding the units (AIC = 52.54, delta AIC = 9.41):

- a1 a2 a3 a4 a5
psi 1 Unit_2 Unit_3 Unit_4 Unit_5

Betas:
estimate std.error
A1 :occupancy psi -25.955013 (6.786227)
A2 :occupancy psiUnit_2 1.213120 (34.903719)
A3 :occupancy psiUnit_3 1.898660 (14.805456)
A4 :occupancy psiUnit_4 2.272986 (6.081744)
A5 :occupancy psiUnit_5 3.727495 (2.408107)


Not sure why there’s a discrepancy – I believe I’m running the models correctly.

Thanks again.
rk1177
 
Posts: 7
Joined: Thu Apr 01, 2010 9:37 pm

Re: Coding of site covariates

Postby jhines » Fri Apr 09, 2010 1:07 pm

I expected the 'real' parameter (psi) to be the same for both ways of coding the design matrix, but not all of the beta's. The first beta (for unit2 - reference unit) should be similar for both models, and even though they don't look close ( -43.78293 vs -25.955013), they both yield an estimate of psi (nearly) = 0.0. The other beta's should not be equal since they represent the differences between the other units and the reference unit in the 2nd coding.

The 'real' parameter estimates for the first coding are 0,0,1,0,0. And, for the 2nd coding they are 0,0,0,0,0. My guess is that this is a sparse data-set which is causing the likelihood function to not be smooth. By looking at the likelihood values for both models, you can see that the 2nd model has not found the maximum value. I think this may have happened because the optimization routine searched for and found the best value for the intercept beta (-25.95), but one of the other beta's (for unit 4) was very far away from the starting value (and the other beta's). I suspect that if you entered better starting values (eg., -2,0,4,0,0) the 2nd model would give the same estimates of psi as the first.

This is an example that shows that it pays to make sure the estimates that come out of the program make sense. If you look at the data, there should only be detections for unit 4, since occupancy is estimated as zero for the other units. If the 2nd model were true, all of the data would be non-detections (zeros), and the occupancy analysis would be useless.
jhines
 
Posts: 632
Joined: Fri May 16, 2003 9:24 am
Location: Laurel, MD, USA


Return to analysis help

Who is online

Users browsing this forum: No registered users and 1 guest