Page 1 of 2

occupancy as response variable in logistic regression

PostPosted: Wed Jun 27, 2012 12:01 pm
by tpinn
Hello,
I am interested in using site occupancy estimates as the response variable in a logistic regression in order to relate occupancy to some variables of interest. Does anyone know if there are any theoretical problems with this idea? The variables of interest would not be used to generate the occupancy estimates.

Along another line... does anyone know if pseudo-R-squared measures can be used to address model fit or model performance in occupancy models? Thanks!

Re: occupancy as response variable in logistic regression

PostPosted: Wed Jun 27, 2012 12:04 pm
by bacollier
tpinn wrote:Hello,
I am interested in using site occupancy estimates as the response variable in a logistic regression in order to relate occupancy to some variables of interest. Does anyone know if there are any theoretical problems with this idea? The variables of interest would not be used to generate the occupancy estimates.


I suppose the primary question I have is why you would not use the variables of interest which you think influence the occupancy parameter in the actual modeling of occupancy.

Along another line... does anyone know if pseudo-R-squared measures can be used to address model fit or model performance in occupancy models? Thanks!


AFAICT, no.

Re: occupancy as response variable in logistic regression

PostPosted: Wed Jun 27, 2012 1:30 pm
by tpinn
I am interested in determining the relative importance of the explanatory variables, and not everyone agrees that the IT/model averaging approach is the best method in this regard (i.e. Murray and Conner 2009). So I wanted to use logistic regression (with the occupancy estimates as the response variable), so that I can use standardized regression coefficients to determine relative importance. Do you have any thoughts on this approach? Thanks!

Re: occupancy as response variable in logistic regression

PostPosted: Wed Jun 27, 2012 2:43 pm
by bacollier
tpinn wrote:I am interested in determining the relative importance of the explanatory variables, and not everyone agrees that the IT/model averaging approach is the best method in this regard (i.e. Murray and Conner 2009).


As IT methods are used to balance variance/bias tradeoffs at the model set level its not surprising alternative approaches are out there which can be used to statistically evaluate relative importance of a singular predictor, however, a really important singular predictor in a badly fitting model is probably no good either. I have not read M&C, so no thoughts on that, maybe someone else does.

So I wanted to use logistic regression (with the occupancy estimates as the response variable), so that I can use standardized regression coefficients to determine relative importance. Do you have any thoughts on this approach? Thanks!


Not really, other than occupancy estimates are not binary (e.g., they range from 0 to 1) so unless you are categorizing them as success/fail based on some arbitrary cutoff I am not sure how you will run a logistic regression on them and get what you seem to want, maybe you meant linear regression?

It seems, unless I am misunderstanding your post, you are planning on estimating occupancy as a function of some covariates (size, veg, whatever) with a occupancy-type set of candidate models, and then your plan is to predict occupancy to a bunch of locations based on the covariate values of interest for those locations (e.g., sites with veg=10, 20, 30), then re-regress those same ecological covariates of interest (size, veg, whatever) you used in the occupancy model via some regression model on the occurrence probability you predicted to each location to get some measure of variable importance? I am not sure that you are gaining any information.

bret

Re: occupancy as response variable in logistic regression

PostPosted: Wed Jun 27, 2012 4:36 pm
by tpinn
Well, I intended to use detection covariates to generate occupancy estimates, and then regress different explanatory variables (habitat, etc.) against the occupancy estimates. But, I suppose this approach would make it more difficult to generate accurate occupancy estimates.

I think logistic regression would be the best regression method (as the data are proportions and not continuous data), but I'm not sure how to use proportions generated in another program (i.e. Presence) as the response variable for logistic regression (though I know that in some software you can use binary or frequency data in logistic regression). I'm just trying to find the best way to determine the relative importance of the explanatory variables that relate to occupancy.

Thanks for your thoughts on the issue.

Re: occupancy as response variable in logistic regression

PostPosted: Wed Jun 27, 2012 5:06 pm
by bacollier
tpinn wrote:I think logistic regression would be the best regression method (as the data are proportions and not continuous data), but I'm not sure how to use proportions generated in another program (i.e. Presence) as the response variable for logistic regression (though I know that in some software you can use binary or frequency data in logistic regression). I'm just trying to find the best way to determine the relative importance of the explanatory variables that relate to occupancy.
Thanks for your thoughts on the issue.


FWIW, I don't think you can consider the estimated occupancy probabilities as proportions from some fixed number of locations, typically proportions are something like 1:3, or 2:6, perhaps in some sort of a event/trials format, which would be appropriate for a logistic regression model, but what you have is a probability estimate (0.33 | some covariate value), which is not the same mathematically as 1:3, nor is it the same as saying 33% of n sites are occupied.

Note that your data are actually continuous, yet bounded at the upper and lower limits of a probability (0 and 1). You can use binary and frequency data in logistic regression in most any stat package you choose, using either logit of cloglog link functions for instance, and many folks (wrongly imho) arc-sine transform proportional data for linear regression models, but that is another discussion entirely.

As an aside, you might be able to look at the summed model weights for models containing various parameters (B&A show how to do this) but I have never been a fan of that approach because it requires your model set to be fully balanced to give accurate representation, e.g., if you have one variable in all the models, the relative variable importance will be 1. I suppose you could look at some sort of an 'effect' by including or removing a variable from a model has on your occupancy predictions, maybe via comparing AUC/ROC estimates for models containing, or not containing a particular covariate, probably would not be very robust though. Someone else may have thoughts on this.

bret

Re: occupancy as response variable in logistic regression

PostPosted: Wed Jun 27, 2012 5:51 pm
by jlaufenb
Bret,

B&A discuss the use of evidence ratios as a measure of importance of the most supported model compared to other, less supported models. I would think an evidence ratio based on the likelihoods (or equivalently AIC weights) from a pair of (occupancy) models, where one model contains the covariate of interest and one does not, could be used to gain inference on the importance of that covariate. The advantage of this over using estimated weights of evidence is that evidence ratios do not depend on the full set of models, thus not requiring a balance model set. Thoughts?

Jared

Re: occupancy as response variable in logistic regression

PostPosted: Wed Jun 27, 2012 6:44 pm
by cooch
tpinn wrote:I am interested in determining the relative importance of the explanatory variables, and not everyone agrees that the IT/model averaging approach is the best method in this regard (i.e. Murray and Conner 2009). So I wanted to use logistic regression (with the occupancy estimates as the response variable), so that I can use standardized regression coefficients to determine relative importance. Do you have any thoughts on this approach? Thanks!



Method is generally robust if the candidate model set is balanced wrt to various factors. There are some issues related to model redundancy, but for the moment, this is perhaps the best compromise wrt to the 'perfect solution', which doesn't exist.

Re: occupancy as response variable in logistic regression

PostPosted: Wed Jun 27, 2012 6:45 pm
by cooch
tpinn wrote:Well, I intended to use detection covariates to generate occupancy estimates, and then regress different explanatory variables (habitat, etc.) against the occupancy estimates.



This is doing statistics on statistics. Don't do it.

Re: occupancy as response variable in logistic regression

PostPosted: Wed Jun 27, 2012 6:47 pm
by cooch
jlaufenb wrote:Bret,

B&A discuss the use of evidence ratios as a measure of importance of the most supported model compared to other, less supported models. I would think an evidence ratio based on the likelihoods (or equivalently AIC weights) from a pair of (occupancy) models, where one model contains the covariate of interest and one does not, could be used to gain inference on the importance of that covariate. The advantage of this over using estimated weights of evidence is that evidence ratios do not depend on the full set of models, thus not requiring a balance model set. Thoughts?

Jared


Doing so (as described) amounts to a form of a LRT. That is fine, as it stands, but suffers from the usual problem(s) of simple paired model comparisons (the most important of which is -- you get a different answer depending on which pair of models with and without a factor you compare).