Multi-Method Data Analysis

Forum for discussion of general questions related to study design and/or analysis of existing data - software neutral.

Multi-Method Data Analysis

Postby kdavis79 » Tue Aug 13, 2024 11:28 am

Hello, I’ve been reading a lot of these forum posts in an attempt to work out how to approach my data analysis, but I’m still struggling to wrap my head around it, so I figured I would post and ask for some insight. I haven't taken any formal occupancy course, but I've been working through the MacKenzie text book and many of these forums, so if I've missed any resources that might be directly relevant, I thank you in advance for directing me to them.

My research focuses on amphibian occupancy. We used two survey methods at each visit. (1) eDNA sampling (one homogenized sample taken), (2) visual encounter surveys (two surveys were conducted by independent observers 15 minutes apart).

My study design includes:
~200 wetlands, about half of which were visited twice (once in 2022, once in 2023). The other half were visited in either 2022 or 2023 (or were potentially dry in one of those years, resulting in no survey). Therefore,some example detection histories look like:
2022[eDNA, VS, VS] 2023[eDNA, VS, VS]

010 NA NA NA (a site that was dry or not surveyed in 2023)
010 110 (a site surveyed in both years)
NA NA NA 110 (a site that was dry or not surveyed in 2022)

My research questions are primarily focused on determining the most important variables on amphibian occupancy itself. Many of these variables differ across years (e.g., wetland area, local precipitation and temperature metrics) but some do not (e.g., impermeable surfaces, % willow). I also have many variables that could affect detection, but which only apply to one method or the other (e.g. eDNA detection could be impacted by pH, while VS detection is impacted by observer). While I want to take into account the additional information provided by multiple survey approaches, and account for different detection rates, that is not the primary focus of my research questions.

I see several different ways I could approach this analysis, each of which comes with potential cons. I will attempt to outline my thought process below, but I'll provide a bit of a TL;DR for the overarching conceptual questions at the end.

To help understand the different approaches I've described below, I've included diagrams on this google doc (I couldn't figure out how to include images here): [url]https://docs.google.com/document/d/1awbvMA18fqJnnWG1KybBigPmCN8ppTCCqaWOqqPsgzU/edit?usp=sharing
[/url]

Approach 1) A multi-scale approach in which theta is the year (2022 or 2023)

Understanding theta has taken a while but my interpretation in this scenario is that theta is the “availability of the species to be detected in a given year, by any method.” The natural problem with this approach is that I would be assuming closure between years, which is not biologically logical, although I could potentially make the assumption and then take great care in interpreting psi.

If I were to take this approach, is it valid to interpret theta as occupancy in year 2022 or 2023? Instead of availability to detection?

Where would I incorporate my variables on occupancy? Some are variable across years, which would imply they need to be included as predictors of theta, but some are invariable – should these be predictors of psi? Or could I still include them at the theta level, and leave psi as intercept only?

Something I’ve struggled with on multi-scale models in general – do they actually yield different estimates of p for each detection method? Or is method just incorporated as a predictor variable for detection and the measure of p is then calculated for each one?
In essence, does my output look like:
p.eDNA = 0.5, p.VS = 0.7 or
p = 0.4 + B(method)?
Can I have predictor variables for detection which are specific to each method type?


Approach 2) A multi-scale approach in which theta is the method (eDNA or Visual Survey)

This might resolve some of my questions regarding p in the first approach, in that I would have an estimate of p for each survey occasion and a more universal estimate of the effectiveness of eDNA surveys versus visual surveys in the form of theta. The interpretation of theta in the phrasing that I’ve seen here would be “the availability of the species to being detected by eDNA and by Visual Surveys).
I’m not sure how predictor variables would break down hierarchically here:
In eDNA, all variables related to detection (conductivity, pH, turbidity) were collected each year, so they vary by year and would need to be included as variables of p
Theta(eDNA) would thus have no variables directly affecting it
In Visual Surveys, variables on detection include things like air temperature, surveyor, first or second survey, start time. Some of these things vary between surveys, some do not (e.g., except in rare cases, the weather does not change between surveys). However, none are constant between 2022 and 2023, therefore we could consider them all variable at the p level, and theta(VS) once again does not have any direct predictor variables.

As noted above, in this scenario, we are still assuming occupancy is closed across years, which is biologically problematic. Interpretation of psi would have to be done very carefully.

Problem: my research questions of interest are in variables that affect occupancy, many of which change across years. This framework allows no place to include those variables – I would have to standardize or average them across both years to include them as variables on psi.

Approach 3) Model 2022 and 2023 separately

While this approach does make the most sense biologically (no longer assuming occupancy is constant across years), I could lose a lot of power because the size of my detection history is reduced from 6 to 3 observations.

A multi-season approach is not really an option, as I don't have multiple visits within the same year/season, since all surveys took place at the same time (on the same day)

I’ve been debating if a multi-scale approach (separated by year) makes sense, but there would be little difference between theta and p for eDNA, since there would only be one eDNA observation.

Harkening back to the first approach, can the basic single-season occupancy model provide two estimates of p? Do any approaches really model p separately for different methods, or is it more like the p = 0.4 + B(method) approach I described above? And within that, can I have predictor variables which are specific to each method type?

Some implementation questions (I was hoping to use Rpresence):
Can you include variables for theta? I can’t tell how to construct a pao object that has a dataframe/list of variables for theta itself
Can you do a multi-method approach with different numbers of surveys for each method? I noticed there’s an argument for number of methods (i.e. 2) but since I have a repeat survey with one method and not with the other, I’m not sure how that applies (does it just sequence over the detection history, i.e., assuming odd numbers are the first method and even numbers are the second method?).

TL;DR Summarzing more overarching questions rather than methodological details

When modeling detection for different methods, are the two methods actually modeled separately (i.e., p.eDNA ~ eDNA variables and p.VS ~ VS variables) or is method just another predictor for detection (i.e., p ~ METHOD + eDNA variables + VS variables)? Does this change depending on if you implement a multi-scale model or a basic single-season occupancy model?

When given two years of data over which it is not biologically reasonable to assume closure, with 3 observations from one time-point within each year, would the better approach be to combine observations and interpret psi with care or to model each year separately, despite the loss of power from fewer observations?

How does one balance the desire to incorporate multiple methods and estimate their detection rates with the desire to understand the variables driving occupancy, not detection?


If you read this far, I sincerely thank you - I'm sure it was not the most straightforward read. If you have any suggestions on where to start/overall approach, it will be greatly appreciated.
kdavis79
 
Posts: 6
Joined: Tue Aug 13, 2024 9:53 am

Return to analysis & design questions

Who is online

Users browsing this forum: No registered users and 1 guest

cron