www.phidot.org

by **rk1177** » Sat Apr 03, 2010 9:02 pm

Dear readers,
I’m a new PRESENCE user, attempting to run a multi-season occupancy analysis. I would like to model psi, gamma and eps as a function of a site covariate: management unit (5 groups). From what I’ve read in earlier posts, it is possible to code such a covariate either categorically or continuously, depending on whether it is important to preserve an ordered relationship or not.

Is there any advantage to dummy coding a covariate in PRESENCE?
Unit 2 1 0 0 0
Unit 3 0 1 0 0
Unit 4 0 0 1 0
Unit 5 0 0 0 1
Unit 7 0 0 0 0

I’ve compared models with Unit coded categorically versus continuously (see below). Categorical:
Unit
2
3
4
5
7
Continuous:
Unit
1
2
3
4
5
The models have equal support, but their output differs slightly. Both model outputs give an estimate of psi and eps = 0. However, the categorical model output gives gamma = 0 and the continuous model gives gamma = 1. If both methods of coding are valid (assuming I've done it correctly), why does the estimate of gamma differ?

Thanks in advance for your expertise,
R

by **jhines** » Mon Apr 05, 2010 10:57 am

The two codings you've listed are both 'continuous'. The only difference is that you've changed the numerical relationship between the units. In the 2nd coding, psi (or gamma or whatever) for unit 2 will be twice as big as psi for unit 1 (on the logit scale), assuming your design matrix for psi did not have an intercept term. In the 1st coding, psi for unit 2 will be 50% (3/2) bigger than psi for unit 1. You probably don't want to use this type of model.

What you want for a categorical relationship would be to add 'dummy' categorical variables. Instead of a single site covariate named 'unit', you need 5 site covariates, named 'unit1','unit2','unit3','unit4','unit5'. Each of these site covariates would be 1 if the the site belonged to that unit, and 0 if not. Then, your design matrix for psi would look like this:

Code: Select all: Occupancy - a1 a2 a3 a4 a5 psi1 unit1 unit2 unit3 unit4 unit5

This would allow initial occupancy to be different for each unit, but no numerical relationship between the units.

Assuming there is a reason for the different units to have a linear increasing or decreasing occupancy parameter, you could code the 'continuous' relationship with the unit covariate as:

unit2 sites - unit covariate=0
unit3 sites - unit covariate=1
unit4 sites - unit covariate=2
unit5 sites - unit covariate=3
unit7 sites - unit covariate=4

Then use the following design matrix for psi:

Code: Select all: Occupancy - a1 a2 psi1 1 unit

So, logit(psi) = 1*a1 + unit*a2. or

Logit(Psi(unit1)) = a1
Logit(Psi(unit2))=a1+a2
Logit(Psi(unit3))=a1+2*a2
Logit(Psi(unit4))=a1+3*a2
Logit(Psi(unit5))=a1+4*a2
Logit(Psi(unit5))=a1+5*a2

Jim

by **rk1177** » Wed Apr 07, 2010 8:52 am

Thanks much!

by **rk1177** » Fri Apr 09, 2010 11:06 am

A follow up question: If I chose to dummy code Unit, would it be valid to have PRESENCE estimate an intercept? So the design matrix would look like:

a1 a2 a3 a4 a5 a6
psi 1 Unit1 Unit 2 Unit 3 Unit 4 Unit 5

by **jhines** » Fri Apr 09, 2010 11:21 am

If you want to estimate a parameter for 5 units, you need to estimate 5 beta's (a1,a2,a3,a4,a5). If you add an intercept, you'll be estimating 6 beta's.

The way I suggested, psi is estimated independently for each unit. (eg., logit(psi(unit1))=a1, logit(psi(unit2))=a2,...)

Another way to get the same occupancy estimates would be to make one unit a 'reference' unit, and estimate the difference between this unit and the others. To do this, the design matrix would look like this:

Code: Select all: a1 a2 a3 a4 a5 psi 1 unit2 unit3 unit4 unit5

I made unit1 the 'reference' unit, but could have made any of the others the 'reference' unit. With this design matrix, logit(psi(unit1))=1*a1, logit(psi(unit2))=1*a1+1*a2, logit(psi(unit3))=1*a1+1*a3,...

Notice that I still only estimate 5 beta's (a1-a5). Both design matrices will give the same estimates of psi for each unit.

Jim

by **rk1177** » Fri Apr 09, 2010 12:36 pm

Thanks for the speedy reply!

I just tried both coding methods that you described, but I didn’t get the same estimates of psi and the models had different AIC values.

My model input for estimating psi for each unit independently (AIC = 43.13, delta AIC = 0):

- a1 a2 a3 a4 a5
psi Unit_2 Unit_3 Unit_4 Unit_5 Unit_7

Betas:
estimate std.error
A1

ccupancy psiUnit_2 -43.782931 (0.000000)
A2

ccupancy psiUnit_3 -49.318625 (0.000000)
A3

ccupancy psiUnit_4 114.036615 (0.000001)
A4

ccupancy psiUnit_5 -35.796095 (-1.#IND00)
A5

ccupancy psiUnit_7 -142.120110 (0.000001)

My model input for dummy coding the units (AIC = 52.54, delta AIC = 9.41):

- a1 a2 a3 a4 a5
psi 1 Unit_2 Unit_3 Unit_4 Unit_5

Betas:
estimate std.error
A1

ccupancy psi -25.955013 (6.786227)
A2

ccupancy psiUnit_2 1.213120 (34.903719)
A3

ccupancy psiUnit_3 1.898660 (14.805456)
A4

ccupancy psiUnit_4 2.272986 (6.081744)
A5

ccupancy psiUnit_5 3.727495 (2.408107)

Not sure why there’s a discrepancy – I believe I’m running the models correctly.

Thanks again.

by **jhines** » Fri Apr 09, 2010 1:07 pm

I expected the 'real' parameter (psi) to be the same for both ways of coding the design matrix, but not all of the beta's. The first beta (for unit2 - reference unit) should be similar for both models, and even though they don't look close ( -43.78293 vs -25.955013), they both yield an estimate of psi (nearly) = 0.0. The other beta's should not be equal since they represent the differences between the other units and the reference unit in the 2nd coding.

The 'real' parameter estimates for the first coding are 0,0,1,0,0. And, for the 2nd coding they are 0,0,0,0,0. My guess is that this is a sparse data-set which is causing the likelihood function to not be smooth. By looking at the likelihood values for both models, you can see that the 2nd model has not found the maximum value. I think this may have happened because the optimization routine searched for and found the best value for the intercept beta (-25.95), but one of the other beta's (for unit 4) was very far away from the starting value (and the other beta's). I suspect that if you entered better starting values (eg., -2,0,4,0,0) the 2nd model would give the same estimates of psi as the first.

This is an example that shows that it pays to make sure the estimates that come out of the program make sense. If you look at the data, there should only be detections for unit 4, since occupancy is estimated as zero for the other units. If the 2nd model were true, all of the data would be non-detections (zeros), and the occupancy analysis would be useless.

www.phidot.org

Coding of site covariates

Coding of site covariates

Re: Coding of site covariates

Re: Coding of site covariates

Re: Coding of site covariates

Re: Coding of site covariates

Re: Coding of site covariates

Re: Coding of site covariates

Who is online