Categorical variable for detection probability

questions concerning analysis/theory using program PRESENCE

Categorical variable for detection probability

Hello,

I am running single-specie single-season false-positive models with the interview data in PRESENCE. I have three covariates for modeling the detection probability, in which gender is categorical, age is continuous and time spend is also categorical.

I want to know how to enter the categorical variable in the design matrix of detection probability. I have two categories male and female which I have entered in the detection covariates file as follows:

1 2 3
A4 M M M
A5 M F M
A6 F M M

I have converted them into two indicator variables where Male is represented by 1 and female is represented by 0. The detection probability design matrix is as follows:

b1 b2 b3 b4

p1(1) 1 0 0 0
p1(2) 1 0 0 0
p1(3) 1 0 0 0
p10(1) 0 1 0 Gender
p10(2) 0 1 0 Gender
p10(3) 0 1 0 Gender
b1(1) 0 0 1 0
b1(2) 0 0 1 0
b1(3) 0 0 1 0

I have kept p1 and b1 constant but the p10 is varied with the gender covariate. I just wish to know if it is the right way to enter the categorical covariates in the detection probability design matrix or if there is some other way. I was not able to find how to enter categorical covariates for detection probability but for occupancy probability, it is given in manuals.

Thank You
Prashant_mahajan

Posts: 21
Joined: Wed Mar 31, 2021 11:14 am

Re: Categorical variable for detection probability

I assume you meant to say you converted the sex categories into *one* indicator covariate (not two indicator variables). Specifically,

Gender=0 for female and Gender=1 for male.

Your design matrix is correct. The "beta" estimate in the output labelled, "b4" should be the sex effect on detection (difference in detection probability between female and male interviewees on the logit scale).

For the continuous covariate, you simply use the numerical value. The time-spent covariate can be done like the sex covariate, but if you have more than two possible categories, you'll need more indicator covariates. For example, if you have 3 categories, (eg., <1 hour, 1-3 hours, >3hours), then you would need two indicator covariates. For the example, they would be:

durationMedium = 1 if interview was 1-3 hours, 0 otherwise
durationLong = 1 if interview was > 3 hours, 0 otherwise

In general, if you have a categorical covariate which has N possible categories, you need N-1 indicator covariates.
jhines

Posts: 587
Joined: Fri May 16, 2003 9:24 am
Location: Laurel, MD, USA

Re: Categorical variable for detection probability

Thank you for the reply. After running the model with gender as a covariate for detection probability my beta estimates are as follows:

estimate std.error
A1 psi.a1 : -0.783792 0.201226
B1 p(5).b1 : 1.063840 0.172170
B2 p10(5).b2 : -4.566087 1.018033
B3 b(5).b3 : 2.318050 0.302540
B4 p10(1).Gender : -25.062993 125774.009571

beta estimate of "B4" is negative, what does that signify? As for continuous covariate it means that the covaraite has a negative influence on the parameter but what does it means for the categorical variable? As for the 1st survey specific values for few site is as follows:

Site estimate Std.err 95% conf. interval
p10(1) 1 A4 : 0.0103 0.0104 0.0014 - 0.0710
p10(1) 2 A5 : 0.0103 0.0104 0.0014 - 0.0710
p10(1) 3 A6 : 0.0103 0.0104 0.0014 - 0.0710
p10(1) 4 A7 : 0.0103 0.0104 0.0014 - 0.0710

and the covaraite design matrix for sex as follows:
R1 R2 R3 R4 R5
A4 1 1 1 1 1
A5 1 1 1 1 1
A6 0 1 1 0 1
A7 1 1 1 0 1

If we estimate p10 probability value for first replicate for site A4 using equation e(B1 + B4(1))/1+ e(B1 + B4(1)), it is showing same as site A6 for first replicate , which will have the equation e(B1 + B4(0))/1+ e(B1 + B4(0)). Likewise we will get values for each site for each replicate for male and female. But howcome the values are same for different sites for first replicate.

I am exactly not able to interpret my results.
Prashant_mahajan

Posts: 21
Joined: Wed Mar 31, 2021 11:14 am

Re: Categorical variable for detection probability

The “b4” estimate is the estimate of the difference in false-positive detection probability between females and males (p10(males) – p10(females)) on the logit scale. Since this is negative, males have a lower probability of false-positive detection than females.
The section of the design matrix for false-positive detection is:

....... b1 b2 b3 b4
p10(1) 0 1 0 Gender
p10(2) 0 1 0 Gender
p10(3) 0 1 0 Gender

We can translate this into a series of equations as:

logit(p10(1)) = 0*b1 + 1*b2 + 0*b3 + Gender*b4
logit(p10(2)) = 0*b1 + 1*b2 + 0*b3 + Gender*b4
logit(p10(3)) = 0*b1 + 1*b2 + 0*b3 + Gender*b4

Simplifying and applying the inverse-logit function gives:

p10(1) = exp(b2 + Gender*b4)/(1+exp(b2+Gender*b4))
p10(2) = exp(b2 + Gender*b4)/(1+exp(b2+Gender*b4))
p10(3) = exp(b2 + Gender*b4)/(1+exp(b2+Gender*b4))

Since Gender=0 for females, false-positive detection probability for females is:

P10(females) = exp(b2)/(1+exp(b2)) = 0.0103

And false-positive detection for males (Gender=1) is:

P10(males) = exp(b2+1*b4)/(1+exp(b2+1*b4)) = 0.0000

The negative value for b4 made the estimate for males smaller than the estimate for females.

It’s a good thing you’re looking at the real estimates and checking if they make sense. If your covariate data has the first two sites as “M” and the third site as “F” for survey 1, then the first two estimates should be 0.0 and the third site should be 0.0103. There might be a problem in how you entered your covariate data. You have only 3 surveys in the design matrix, but 5 surveys in the covariate data.

If correcting the covariate data doesn’t fix the problem, I’d be happy to diagnose… just send me the most recent zipfile in your project folder.
Jim
jhines

Posts: 587
Joined: Fri May 16, 2003 9:24 am
Location: Laurel, MD, USA

Re: Categorical variable for detection probability

Thank you for clarifying the "B4" parameter. it was really helpful. The covariate data that I have entered has 5 replicates only, but here I have entered just 3 as an example. Even after correcting the file, the first three sites have the same values for the b4 estimate for 1st replicate, however, it should have been identical for the first two sites (males) and different for the third site (female). I am sending you the most recent backup file it would be really helpful if you can just go through it once and please let me know if there are any errors that need to be corrected.

Thank You
Prashant_mahajan

Posts: 21
Joined: Wed Mar 31, 2021 11:14 am

Re: Categorical variable for detection probability

Thanks for sending the file. There was a bug in that version of Presence (2.13.18) in the printing of of the p10 estimates. Please download the latest version of Presence (2.13.39).

I noticed your "Age" covariate ranged in value from 21 to 87. I suggest scaling that covariate, perhaps by dividing each value by 100, in order to prevent numerical problems in the calculations. (Computer roundoff-error can become a problem when the argument of the exponential function is large.)
jhines

Posts: 587
Joined: Fri May 16, 2003 9:24 am
Location: Laurel, MD, USA

Re: Categorical variable for detection probability

Thank you again.
Prashant_mahajan

Posts: 21
Joined: Wed Mar 31, 2021 11:14 am

Re: Categorical variable for detection probability

Hello,

For categorical covariates having N categories, there should be N-1 indicator variables. Suppose I have 5 categories A, B, C, D, & E for a particular covariate, then the indicator variables will be 4. The data for the categorical covariate is like this:

1 2 3 4 5
i A A C A E
ii B D B B A
iii C D A E B

Therefore, the indicator variables will be like this:

A
1 1 0 1 0
0 0 0 0 1
0 0 1 0 0

B
0 0 0 0 0
1 0 1 1 0
0 0 0 0 1

C
0 0 1 0 0
0 0 0 0 0
1 0 0 0 0

D
0 0 0 0 0
0 1 0 0 0
0 1 0 0 0

I want to know if in the design matrix for false-positive detection probability, should I enter all the indicator variables together or should I enter each category one by one for which I want to know the effect on the false-positive detection probability? I have entered the data in the design matrix like this:

b1 b2 b3 b4 b5 b6 b7

p1(1) 1 0 0 0 0 0 0
p1(2) 1 0 0 0 0 0 0
p1(3) 1 0 0 0 0 0 0
p1(4) 1 0 0 0 0 0 0
p1(5) 1 0 0 0 0 0 0
p10(1) 0 1 0 A B C D
p10(2) 0 1 0 A B C D
p10(3) 0 1 0 A B C D
p10(4) 0 1 0 A B C D
p10(5) 0 1 0 A B C D
b1(1) 0 0 1 0 0 0 0
b1(2) 0 0 1 0 0 0 0
b1(3) 0 0 1 0 0 0 0
b1(4) 0 0 1 0 0 0 0
b1(5) 0 0 1 0 0 0 0

Also, the value for each category on the logit scale will be calculated like this:

logit(p10(1)) = 0*b1 + 1*b2 + 0*b3 + A*b4 + B*b5 + C*b6 + D*b7

The value of category "E" will be intercept only i.e., "b2" or it will be something else?

Thank you
Prashant_mahajan

Posts: 21
Joined: Wed Mar 31, 2021 11:14 am

Re: Categorical variable for detection probability

Yes, you are correct. If the covariate at a site/survey is category, "E", then the indicator covariates, A,B,C,D will all equal zero and the formula for logit(p10(1)) = 0*b1 + 1*b2 + 0*b3 + 0*b4 + 0*b5 + 0*b6 + 0*b7 = b1. So, the beta estimate, b1, is the "intercept". The beta associated with "A" is b4 and represents the difference in false positive detection (on logit scale) between the intercept (E) and category A. The beta estimate, b5 is the difference between the intercept and category B. The beta associated with the intercept, b2, is not very meaningful as it is only the difference between category E and a probability of zero. Computing the inverse-logit of b2 gives you the probability of false-positive detection for category E. Computing the inverse-logit of (b2+b4) gives you the probability of false-positive detection for category A.
jhines

Posts: 587
Joined: Fri May 16, 2003 9:24 am
Location: Laurel, MD, USA

Re: Categorical variable for detection probability

Thank you so much for the clarification.
Prashant_mahajan

Posts: 21
Joined: Wed Mar 31, 2021 11:14 am