www.phidot.org

by **Michalis** » Mon Jun 20, 2011 2:59 pm

I am a new user of presence and currently I am running version 2.4
I am analyzing the occupancy data for fish in a large area of Greece.
I ran 5 models using different site covariates for psi and for each one of those models a p(observer) model =>(So total 10 models)
I got the estimates of psi per site for each model together with the AIC weight put them all in a spreadsheet and calculated the average psi for each site as Average_psi(site1)= psi(model1)*wgtAIC(model1)+psi(model2)*wgtAIC(model2)+.....+psi(model10)*wgtAIC(model10) and so on for the rest of the sites.
Leaving behind the Standard Error estimates for model averaging is it appropriate to visualize the average psi in a GIS map using larger and smaller symbols for the different psi values to conclude on some inference about the spatial trends of occupancy estimates within the overall study area? What would be is the real meaning of this estimate for a site of wich the average psi estimate is say 70% but is verified as occupied (The species was detected there). Would it be more appropriate to use the DERIVED psi estimates (DERIVED parameter - Psi-conditional : [Pr(occ | detection history)]) given in presence output as input for model averaging?
Please excuse my rather naive questions as I am only just starting on the matter. I would realy appreciate any opinion on the above.

by **Michalis** » Sun Jun 26, 2011 3:06 pm

I just installed Presence v.3.1 and used model averaging estimates option in order to check the validity of my psi averaged estimates for each site (the ones I calculated form excel as mentioned above).
Unfortunately the results were not the same. I also noted that for two sites with identical categorical covariates and the same detection history the model average estimate for psi was deferent. How can this be? What is the function used by presence v.3.1 to derive the model average psi estimates?

Any help on the above will be highly appreciated

by **jhines** » Fri Jul 01, 2011 11:10 am

In your first post, it looks like you're computing the model-averaged estimates correctly. Usually, the conditional-occupancy (from the 'derived parameters section) is used for generating maps of occupancy, but it depends on whether you want to show occupancy, given the data collected, or just the probability of occupancy, disregarding whether each individual site is occupied. Neither one is 'right' or 'wrong'.

The model-averaging in the new version of PRESENCE is something I added recently and probably only works for very specific cases. (eg., All models need to have the same parameters, and I'm not sure what is done with missing values.) I added this option since a few people said it would be nice if it could be done, but I really need to know how people would like to use it. Perhaps, you could send me your files and I could check to see if we're computing it the same way? (I'd need the pres_backup5.zip file in the project folder.)

Jim

by **Michalis** » Sat Jul 09, 2011 6:18 am

Hi Jim,

Thank you a lot for your reply.

I have uploaded the whole project folder in a zip file since I could not find pres_backup5.zip you mentioned. Please download it from:

http://www.sendspace.com/file/fiuuxa

As you will see sites 2 and 3 should have the same Average psi estimates since the have both the same detection history (0,0,0) and the same site categorical covariates.

I also uploaded wy excel calculations for you to have a look.

http://www.sendspace.com/file/k8vpce

Let me also exlain my worries on the matter. I found that the best model (for another species) was psi(Zone)p(....) where "Zone" is the categorical site covariate for each of the 3 Zones of the National park I study. Zone A has the most strict protection measures (e.g. for fishing) so I would expect psi (Zone_A)>psi(Zone_B)>psi(Zone_C). but this was not the case with the model's results. So I thought mabe there is a different factor that influence accupancy and I wanted to see if there is a spatial relation/trent in occupancy that would explain the success of the model (e.g. north to south, east to west, distance from the shore, distance from point pollution sources) and subsequently came up with the right covariate and test it within a model that I would 'a priory' know that would have good AIC values.

Do you think this approach stands?

Thank you again

Michalis

P.S. Excuse my late reply I was off town the previous week

by **jhines** » Mon Jul 11, 2011 1:09 pm

Hi Michalis,

As I suspected, the model-averaging in PRESENCE will not work unless all models have estimates from all sites printed out. Since you have models with constant occupancy, the individual site estimates are not printed for that model. So, I would recommend doing the model-averaging by spreadsheet (as you have done). I'll try and think of a way to make the model-averaging work, regardless of whether individual site estimates are printed.

With only 35 sites, the precision of the occupancy estimates is fairly large. This makes it difficult for the model to detect a significant difference between the groups and it's possible that the estimates are so poor that they aren't very close to reality. So, it's not surprising that a particular dataset gives estimates in the wrong order. In general, I think it's OK to search for other explanations (models) to explain the differences between groups, as long as you state that it was done a posteriori. With only 3 groups, however, there are only a very limited number of possible outcomes for a model and your 'best' model might be due to random chance. Even with your a priori prediction, there would be a 1 out of 6 chance that the estimates would come out in the order you predicted by random chance. To make a reasonably solid argument for your a priori prediction, I think you would need a decent number of datasets which all (or most) come out in the predicted order.

by **Michalis** » Thu Jul 14, 2011 4:54 pm

Dear Jim,

Thank you for taking the time to look at my case.

I would like to illustrate the way I used model averaging:

As I came to the conclusion that the Zone effect was not due to the more strict conservation measures applied in each zone (because of the contradictions in psi estimates for each Zone I pointed out in my previous post), I started thinking that if I could see the weighted results of all models in a map, I could perhaps find a spatial relation to the data that I could use to extract another model and test this model afterwards against the ones already run. So I averaged the models with the spreadsheet and made the following map on arcmap. (Please see the image I attach showing the different classes of the average psi for each site):

http://www.sendspace.com/file/vt9v0i

Note that all sites within Zone B are deep red (that shows the influence of the ‘best model’).
I noticed also that the red and orange dots ranged in parallel to the distance of the sampling site from the deep isolines. i.e. The average occupancy probability was high in the shores that quickly deepen. I also noticed that the shape of the -20m isoline fitted better than the -15 one to the average occupancy estimates. So I calculated the distance of each sampling point from both isolines and converted the values into z-scores to use as a continuous site covariate.
As I expected (and had forseen using the average model estimates psi spatial distribution) the -20m model (Model C7) ranked first having the lowest AIC score and model -15m (Model C8) ranked second. This is in line with the species biology (it likes steep reef waters).

I uploaded the whole model set for this species in the above link.

So do you think it is wrong to conclude that depth is the most probable crucial factor in the occupancy of the sites and that the proximity with deeper water is one of the probable factors affecting the distribution of the species on shore? Was it wrong to follow that path of thinking?

The sampling design was indeed limited through time and budget and surely it would be better if I could have more data to work with. But as a beginner I would be pleased if I could explain why my data are not sufficient for the situation I deal with and the way I analyse it. So could you please help me a bit on this? What is the part of a model’s output (statistical value) that I could use to evaluate my analysis (I guess it is the standard error of the psi estimates and the associated 95% confidence intervals but what about the bootstrap results or maybe other statistical values?). Also advise on how to use these valuswould be wellcome (if I'm not pushing things too much).

Thanks again

MIchalis

by **jhines** » Thu Jul 21, 2011 9:37 am

Michalis,

Your line of thinking regarding the hypothesis about water depth seems OK to me. To test your hypothesis, you would look at the beta estimate (untransformed estimate) in the output which is associated with the depth covariate. If this estimate is significantly different from zero (relative to it's standard error), it is an indication that the depth covariate is an important factor describing occupancy. The bootstrap estimates and standard errors are useful if the estimation of the parameters gives strange or invalid results. Sometimes, sparse data can cause estimates and standard errors to be difficult to estimate and the program indicate that some estimates or standard errors could not be estimated. When that happens, bootstrapping can yield estimates for the data.

Jim

www.phidot.org

Model averaging visualisation

Model averaging visualisation

Re: Model averaging visualisation

Re: Model averaging visualisation

Re: Model averaging visualisation

Re: Model averaging visualisation

Re: Model averaging visualisation

Re: Model averaging visualisation

Who is online