I'm looking for a little feedback on an issue I discussed recently with my supervisor.
I've used a bayesian approach for estimating species richness across a range of sites. The basic model structure was similar to that used by Zipkin et al. J.App.Ecol. 2009(46)815:822. Impacts of forest fragmentation on species richness: a hierarchical approach to community modeling. I have the models working (hence I've not posted the main code here), however, I ran into an interesting issue. First, in my approach, I augmented the 3-d array (sites x reps x species) by an arbitrary number of latent species (nzeroes) beginning with 50. After examining the posterior distribution of N (number of estimated species), is was clear this wasn't enough as the posterior distribution was right truncated...i.e...more species were expected to occur in the total collection of sites than actually observed under the model. So, I increased the the number of latent species to prevent this truncation. As a second part of the analysis, however, I also estimated the number of estimated species within ecological groups by the product the binary latent matrix (j) (z,i:n) x a binary group classification where n = the number of species observed in the collection of sites as opposed to n +nzeroes because nzeroes can't be assigned to any ecological group. Obviously the sum of estimated species from the guild analysis is < than the total estimated number of species because total richness is indexed by n + nzeroes rather than n. So, it was suggested to me that I should simply model the total number of species that could possibly occur in the study area (and for which guilds could be assigned) regardless of whether they ever occur in any of the study sites such that the sum of guild estimates should equal N estimated. This makes some sense as n + nzeroes should not exceed the total number of species that could reasonably occur in a study region, but on the other hand, this approach is almost never used and some component of the community has to be considered "unknown"...so I'm a little resistant to the second approach (in large part because the data structuring is also more complex).
Any advice on the issue of latent species use and ecological group richness estimation in this context would be appreciated.
Brian
p.s. I've cut out the part of the bugs code that estimates total richness and that in guilds
#Sum all species observed (n) and unobserved species (n0) to find the
#total estimated richness
n0 <- sum(w[(n+1):(n+nzeroes)])
N <- n + n0
#Create a loop to determine point level richness for the whole
#community and for subsets or assemblages of interest.
for(j in 1:J){
Nsite[j]<- inprod(Z[j,1:(n+nzeroes)],w[1:(n+nzeroes)])
Nguild1[j]<- inprod(Z[j,1:n],guild1[1:n])