occupancy as response variable in logistic regression

Forum for discussion of general questions related to study design and/or analysis of existing data - software neutral.

occupancy as response variable in logistic regression

Postby tpinn » Wed Jun 27, 2012 12:01 pm

Hello,
I am interested in using site occupancy estimates as the response variable in a logistic regression in order to relate occupancy to some variables of interest. Does anyone know if there are any theoretical problems with this idea? The variables of interest would not be used to generate the occupancy estimates.

Along another line... does anyone know if pseudo-R-squared measures can be used to address model fit or model performance in occupancy models? Thanks!
tpinn
 
Posts: 13
Joined: Tue Jun 26, 2012 12:09 am

Re: occupancy as response variable in logistic regression

Postby bacollier » Wed Jun 27, 2012 12:04 pm

tpinn wrote:Hello,
I am interested in using site occupancy estimates as the response variable in a logistic regression in order to relate occupancy to some variables of interest. Does anyone know if there are any theoretical problems with this idea? The variables of interest would not be used to generate the occupancy estimates.


I suppose the primary question I have is why you would not use the variables of interest which you think influence the occupancy parameter in the actual modeling of occupancy.

Along another line... does anyone know if pseudo-R-squared measures can be used to address model fit or model performance in occupancy models? Thanks!


AFAICT, no.
bacollier
 
Posts: 230
Joined: Fri Nov 26, 2004 10:33 am
Location: Louisiana State University

Re: occupancy as response variable in logistic regression

Postby tpinn » Wed Jun 27, 2012 1:30 pm

I am interested in determining the relative importance of the explanatory variables, and not everyone agrees that the IT/model averaging approach is the best method in this regard (i.e. Murray and Conner 2009). So I wanted to use logistic regression (with the occupancy estimates as the response variable), so that I can use standardized regression coefficients to determine relative importance. Do you have any thoughts on this approach? Thanks!
tpinn
 
Posts: 13
Joined: Tue Jun 26, 2012 12:09 am

Re: occupancy as response variable in logistic regression

Postby bacollier » Wed Jun 27, 2012 2:43 pm

tpinn wrote:I am interested in determining the relative importance of the explanatory variables, and not everyone agrees that the IT/model averaging approach is the best method in this regard (i.e. Murray and Conner 2009).


As IT methods are used to balance variance/bias tradeoffs at the model set level its not surprising alternative approaches are out there which can be used to statistically evaluate relative importance of a singular predictor, however, a really important singular predictor in a badly fitting model is probably no good either. I have not read M&C, so no thoughts on that, maybe someone else does.

So I wanted to use logistic regression (with the occupancy estimates as the response variable), so that I can use standardized regression coefficients to determine relative importance. Do you have any thoughts on this approach? Thanks!


Not really, other than occupancy estimates are not binary (e.g., they range from 0 to 1) so unless you are categorizing them as success/fail based on some arbitrary cutoff I am not sure how you will run a logistic regression on them and get what you seem to want, maybe you meant linear regression?

It seems, unless I am misunderstanding your post, you are planning on estimating occupancy as a function of some covariates (size, veg, whatever) with a occupancy-type set of candidate models, and then your plan is to predict occupancy to a bunch of locations based on the covariate values of interest for those locations (e.g., sites with veg=10, 20, 30), then re-regress those same ecological covariates of interest (size, veg, whatever) you used in the occupancy model via some regression model on the occurrence probability you predicted to each location to get some measure of variable importance? I am not sure that you are gaining any information.

bret
bacollier
 
Posts: 230
Joined: Fri Nov 26, 2004 10:33 am
Location: Louisiana State University

Re: occupancy as response variable in logistic regression

Postby tpinn » Wed Jun 27, 2012 4:36 pm

Well, I intended to use detection covariates to generate occupancy estimates, and then regress different explanatory variables (habitat, etc.) against the occupancy estimates. But, I suppose this approach would make it more difficult to generate accurate occupancy estimates.

I think logistic regression would be the best regression method (as the data are proportions and not continuous data), but I'm not sure how to use proportions generated in another program (i.e. Presence) as the response variable for logistic regression (though I know that in some software you can use binary or frequency data in logistic regression). I'm just trying to find the best way to determine the relative importance of the explanatory variables that relate to occupancy.

Thanks for your thoughts on the issue.
tpinn
 
Posts: 13
Joined: Tue Jun 26, 2012 12:09 am

Re: occupancy as response variable in logistic regression

Postby bacollier » Wed Jun 27, 2012 5:06 pm

tpinn wrote:I think logistic regression would be the best regression method (as the data are proportions and not continuous data), but I'm not sure how to use proportions generated in another program (i.e. Presence) as the response variable for logistic regression (though I know that in some software you can use binary or frequency data in logistic regression). I'm just trying to find the best way to determine the relative importance of the explanatory variables that relate to occupancy.
Thanks for your thoughts on the issue.


FWIW, I don't think you can consider the estimated occupancy probabilities as proportions from some fixed number of locations, typically proportions are something like 1:3, or 2:6, perhaps in some sort of a event/trials format, which would be appropriate for a logistic regression model, but what you have is a probability estimate (0.33 | some covariate value), which is not the same mathematically as 1:3, nor is it the same as saying 33% of n sites are occupied.

Note that your data are actually continuous, yet bounded at the upper and lower limits of a probability (0 and 1). You can use binary and frequency data in logistic regression in most any stat package you choose, using either logit of cloglog link functions for instance, and many folks (wrongly imho) arc-sine transform proportional data for linear regression models, but that is another discussion entirely.

As an aside, you might be able to look at the summed model weights for models containing various parameters (B&A show how to do this) but I have never been a fan of that approach because it requires your model set to be fully balanced to give accurate representation, e.g., if you have one variable in all the models, the relative variable importance will be 1. I suppose you could look at some sort of an 'effect' by including or removing a variable from a model has on your occupancy predictions, maybe via comparing AUC/ROC estimates for models containing, or not containing a particular covariate, probably would not be very robust though. Someone else may have thoughts on this.

bret
bacollier
 
Posts: 230
Joined: Fri Nov 26, 2004 10:33 am
Location: Louisiana State University

Re: occupancy as response variable in logistic regression

Postby jlaufenb » Wed Jun 27, 2012 5:51 pm

Bret,

B&A discuss the use of evidence ratios as a measure of importance of the most supported model compared to other, less supported models. I would think an evidence ratio based on the likelihoods (or equivalently AIC weights) from a pair of (occupancy) models, where one model contains the covariate of interest and one does not, could be used to gain inference on the importance of that covariate. The advantage of this over using estimated weights of evidence is that evidence ratios do not depend on the full set of models, thus not requiring a balance model set. Thoughts?

Jared
jlaufenb
 
Posts: 49
Joined: Tue Aug 05, 2008 2:12 pm
Location: Anchorage, AK

Re: occupancy as response variable in logistic regression

Postby cooch » Wed Jun 27, 2012 6:44 pm

tpinn wrote:I am interested in determining the relative importance of the explanatory variables, and not everyone agrees that the IT/model averaging approach is the best method in this regard (i.e. Murray and Conner 2009). So I wanted to use logistic regression (with the occupancy estimates as the response variable), so that I can use standardized regression coefficients to determine relative importance. Do you have any thoughts on this approach? Thanks!



Method is generally robust if the candidate model set is balanced wrt to various factors. There are some issues related to model redundancy, but for the moment, this is perhaps the best compromise wrt to the 'perfect solution', which doesn't exist.
cooch
 
Posts: 1652
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: occupancy as response variable in logistic regression

Postby cooch » Wed Jun 27, 2012 6:45 pm

tpinn wrote:Well, I intended to use detection covariates to generate occupancy estimates, and then regress different explanatory variables (habitat, etc.) against the occupancy estimates.



This is doing statistics on statistics. Don't do it.
cooch
 
Posts: 1652
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: occupancy as response variable in logistic regression

Postby cooch » Wed Jun 27, 2012 6:47 pm

jlaufenb wrote:Bret,

B&A discuss the use of evidence ratios as a measure of importance of the most supported model compared to other, less supported models. I would think an evidence ratio based on the likelihoods (or equivalently AIC weights) from a pair of (occupancy) models, where one model contains the covariate of interest and one does not, could be used to gain inference on the importance of that covariate. The advantage of this over using estimated weights of evidence is that evidence ratios do not depend on the full set of models, thus not requiring a balance model set. Thoughts?

Jared


Doing so (as described) amounts to a form of a LRT. That is fine, as it stands, but suffers from the usual problem(s) of simple paired model comparisons (the most important of which is -- you get a different answer depending on which pair of models with and without a factor you compare).
cooch
 
Posts: 1652
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Next

Return to analysis & design questions

Who is online

Users browsing this forum: No registered users and 0 guests