www.phidot.org

by **Mateo** » Tue Jul 13, 2010 11:47 pm

I have a question about how to deal with data separation in a single season occupancy model.

One site covariate perfectly predicts absence (hence the data separation). If a use the covariate I get Beta = -37 and SE = 0, which makes sense given the data separation, but I expect reviewers to object to SE=0. Am I wrong about that?

My ideas:
1) Sweep the covariate under the rug and pretend it doesn’t exist. (Not really.)
2) Declare the covariate a perfect predictor, designate sites where it predicts absence as non-habitat, and analyze only the remaining sites.
3) Use a totally different analysis, like maybe a classification tree or bias-reduced logistic regression.
4) Post a question on the phidot phorum.

Any suggestions?

My apologies if this has already been addressed. I did a search for “data separation” and did not find any posts.

by **darryl** » Wed Jul 14, 2010 12:16 am

Congratulation on finding the holy grail! If it predicts absence perfect then I presume it also predict presence perfectly too. ;-)

2) What sort of analyses of the remaining sites are you thinking about. By definition they must be all occupied
3) These don't allow for non-detection, so unless your detectability is really high...
4) You just did that...
5) Go out and design a study to confirm this is a real result and not just a randomly, lucky result (half serious on this one)

How big are your samples sizes? Is it a covariate that you would have expected a-priori to be so highly correlated with presence/absence?

People are getting pretty nervous these days, noting that they've already searched the forum. ;-)

by **Mateo** » Wed Jul 14, 2010 12:31 pm

To clarify, the variable predicts absence, but not presence. For example, "no wings" predicts "no flight" perfectly, but "wings" does not predict "flight". A kiwi would be an apt example. :wink:

2) If I limit my data to "wings" only, I still have sites that are present or absent and I could test the relationship between other covariates and presence. But I don't really like this approach because it excludes data and a good predictor.
3) You have a good point, which is why I have not yet gone down that road. Could I analyze models, minus the separated variable, in both Presence and with a bias-reduced logistic regression and if the beta estimates are similar, then use the bias-reduced regression?
5) Don't have the time or money for field work right now.

I only have 25 presences, so precision is low and the perfect predictor could be part luck. However, the result is biologically sensible. There is one other study on this topic, and the same variable was a perfect predictor. They used an ANOVA (no occupancy) and they simply excluded that variable (my solution #1).

by **darryl** » Wed Jul 14, 2010 10:35 pm

25 presences out of how many surveys from how many sites?

Back to your original question though, a beta estimate of -37 is reasonable in this case and as for the SE for the beta, it should probably be a really large number rather than 0. Did you get any warnings? Essentially though you have a boundary problem in that for those sites your estimate of occupancy is 0. In these situations you can get unstable results for the standard errors. What I suggest you do is rerun that model but fix the beta estimate to something like -37 and see if that changes the SE's for the other parameters in the model.

How does this model rank compared to other models that you've considered?

by **Mateo** » Thu Jul 15, 2010 4:59 pm

25 presences out of how many surveys from how many sites?
I have 537 sites, surveyed 3 times. Out of ~1600 searches, I had 45 detections spread across 25 sites. Needle in a haystack!

Back to your original question though, a beta estimate of -37 is reasonable in this case and as for the SE for the beta, it should probably be a really large number rather than 0. Did you get any warnings? Essentially though you have a boundary problem in that for those sites your estimate of occupancy is 0. In these situations you can get unstable results for the standard errors.
I agree that a large SE makes more sense. I originally did this in RMark and got B=-37 and SE=E-9. The only warning from RMark was that the number of parameters was one too low, obviously due to this parameter. I just redid it in PRESENCE and got B=-44 and SE=E+10. I got the "numerical convergence may not have been reached" warning. Which is obviously true.

What I suggest you do is rerun that model but fix the beta estimate to something like -37 and see if that changes the SE's for the other parameters in the model.
Although I know how to fix a parameter, like p or psi, in MARK or PRESENCE, I don't know how to fix a beta estimate. I asked a couple people in the office, checked Google, and checked the forum, but maybe I overlooked something. I tried to fix psi, just to see, and it just fixes psi at all sites and does not estimate any covariates.
So I couldn't follow your advice. However, I think the real question was, "are the estimates unstable?", so I tried an alternate tact. I changed the covariate for one occupied site to remove the data separation. Although it is faking data, this is a crude solution that I have seen used. Then I reran the model. For the intercept, other covariates, and p, estimates changed by <7% and SEs changed by <2%. So I think the other estimates are pretty stable.

How does this model rank compared to other models that you've considered?
If I allow the separation and use B=-37, it gets 71% of the weight. If I subset the data to remove the variable, it is down to 0.3% weight. If I use the bias-reduced logistic regression, it is 9%, and if I use the fake data "fix" above, then it is 4%. So the choice of analysis makes a big difference.

Sorry for the long response, and thanks for taking an interest in my problem!!

by **darryl** » Thu Jul 15, 2010 5:24 pm

Hopefully Jim Hines will weigh in here about fixing beta parameters, as I can't exactly recall myself. It's a 'hidden' option that we've played with in the past, but i guess I've let the cat out of the bag now.

You're not studying ivory-billed woodpeckers are you? ;-)

You're data is relatively sparse so results are going to be sensitive. I'm a bit confused though where you say that if you subset the data to remove the variable, the model still has 0.3% weight. How can you fit this model if you've removed this variable? Is this variable just 1 category in a categorical covariate with multiple levels? If results are biologically reasonable, I'd be inclined to leave it all in there. Final inferences are always conditional upon everything that has been done including method of analysis, models fit to the data, data collection, etc.

by **Mateo** » Thu Jul 15, 2010 5:43 pm

PRESENCE has an easter egg, huh?

The variable is binary, but the model had 3 covariates (a stretch I know, but they all seem to be estimated ok) so if I subset and drop the problematic variable, the adjusted model, now with 2 covariates, gets 0.3% of the weight.

Leave it all, hmm I will have to chew on that.

by **berghsm** » Tue Aug 10, 2010 3:47 pm

Just wondering if anyone found out how to fix beta parameters? I have a covariate where beta = 29 and SE = 164549. It makes sense behaviorally for this covariate to have a large influence on detection so I am hoping for a more reasonable SE estimate. Thanks.

by **darryl** » Tue Aug 10, 2010 4:47 pm

Fixing the beta won't give you a more reasonable SE of that estimate in this case, in fact you'll end up with a SE of 0 if you do fix it (ie the parameter is now assumed to be known). As your beta values get large (with either positive or negative sign) then the SE will also tend to get larger. This is due to the logit-link function so there's not much that can be done about the problem unless you move to another link function. If all you're other SE's look ok, I wouldn't worry to much about it.

by **berghsm** » Wed Aug 11, 2010 2:42 pm

Yes, the other SE's are fine. Can I move to another link function in Presence like you can in MARK?

www.phidot.org

data separation and occupancy

data separation and occupancy

Re: data separation and occupancy

Re: data separation and occupancy

Re: data separation and occupancy

Re: data separation and occupancy

Re: data separation and occupancy

Re: data separation and occupancy

Re: data separation and occupancy

Re: data separation and occupancy

Re: data separation and occupancy

Who is online