State uncertainty in disease analysis

questions concerning analysis/theory using program MARK

State uncertainty in disease analysis

Postby simone77 » Fri Nov 04, 2011 8:43 am

This is also the title of a paper of Conn and Cooch 2009.
By reading this and others recent papers on this issue it is not clear to me which is, the alternative best option to that explained in the above paper by using E-Surge.
My feeling is that, at the moment, E-Surge is a software you can't learn to use by yourself (at least I can't! up to now there is not something analogous to the Gentle Introduction to MARK for this software).
Even though I hope I will attend a Workshop on this software because it must be really worthing it, I would like to understand what is the best (and easily accessible) option in case you are using MARK.

In Conn and Cooch 2009, they compared the results of M-Surge on the data-censored (without "unknown" state) dataset and those of E-Surge on the same dataset not censored (by using Hidden Markov Models with "unknown", "apparently infected" and "apparently not infected" states). They found that the latter approach get much more precise estimates particularly when the unknown states make up a large percentage of the observations. They also stated that
Conn and Cooch, 2009 wrote: Censoring data when state cannot be ascertained is also a viable solution leading to unbiased estimators of parameters of interest. The issue the is whether to sacrifice some precision in favour of using a simpler model with fewer assumptions (e.g. by censoring data), or whether to employ a more complicated model to be able to utilize all available data (but perhaps at the expense of introducing bias if assumptions are not met)
.

Faustino et al. 2004 used an approach that is not equivalent to the approach implemented in M-Surge in Conn and Cooch 2009 (classic multi-state on censored data).
Faustino et al. 2004 used a double, sequential approach to deal with logistical limitations of their data (very similar situation I am dealing with).
They first made model selection on just phi and p on the complete dataset that included the unknown states (at this stage transition rates were constrained to vary only with disease state), and after they went on by shaping the transition rates on the data-censored dataset by holding fixed the structure of the model pertinent to phi and p as it was selected in the first stage.
As outlined also in the Conn and Cooch paper, the first approach of Faustino et al. 2004 should give typically unbiased estimates of survival and encounter rates and the second approach, as said above, should be unbiased on the transition rates estimates.

It being understood there are other many other important issues about the modeling of disease dynamics by MCR (see Cooch et al 2010 for a review), I have two questions about the best options when dealing with partially observed states in a context like the Faustino et al. paper.

Q1: Is the Faustino et al. 2004 approach the best way to deal with this situation by using the (normally accessible) features in MARK?
Q2: In case the first question is answered "yes", would you report the survival and encounter rates as estimated by the first approach?

Thanks for any help you might provide,

Simone
simone77
 
Posts: 200
Joined: Mon Aug 10, 2009 2:52 pm

Re: State uncertainty in disease analysis

Postby ganghis » Fri Nov 04, 2011 11:31 am

Hi Simone,
Guess it's time I chimed in. I haven't followed the whole thread in detail, so pardon me if I didn't pick up on certain details of your analysis.

I'd suggest using the Conn & Cooch approach if you have a lot of 'unknown' state individuals. What's a lot? Well, my intuition says about 20% or so. Less than that, and the anticipated increase in precision is probably a lot smaller than the increase in headache that you'd get from having to learn new software or adapting existing MARK models to meet your needs (more on that later). There's nothing 'wrong' with censoring data here - the only parameter it biases in your case is detection probability, but who cares about that anyway?

I learned how to fit models in ESURGE on my own, through the help of it's documentation and through some helpful presentations posted on Remi (Choquet)'s website. There is also some more specific instructions in the online supplement to Conn & Cooch that will help you when you try to fit these partial observation models. That said, I have a fair amount of experience and intuition about these models; others may find a workshop useful.

As far as fitting these in MARK, Bill Kendall, Gary White, and others are working on a new class of robust design models capable of mimicking hidden Markov models, but I'm not sure whether that these have been included in new releases of MARK, or the extent to which they would be willing to share their research (I'm not sure where it is in the publication process).

Good luck!
Paul Conn
ganghis
 
Posts: 84
Joined: Tue Aug 10, 2004 2:05 pm

Re: State uncertainty in disease analysis

Postby cooch » Fri Nov 04, 2011 11:52 am

ganghis wrote:Hi Simone,
Guess it's time I chimed in. I haven't followed the whole thread in detail, so pardon me if I didn't pick up on certain details of your analysis.

I'd suggest using the Conn & Cooch approach if you have a lot of 'unknown' state individuals.


Agreed. The approach used in Faustino et al. (2004) was ad hoc -- more or less the best that could be done in the short run, but not the most robust approach given current technologies. I think the HM approach (as Paul applied on Conn & Cooch) is the way to go.

What's a lot? Well, my intuition says about 20% or so. Less than that, and the anticipated increase in precision is probably a lot smaller than the increase in headache that you'd get from having to learn new software or adapting existing MARK models to meet your needs (more on that later). There's nothing 'wrong' with censoring data here - the only parameter it biases in your case is detection probability, but who cares about that anyway?


My intuition concurs -- is a function both of the total proportion of 'unknown' states, and how they occur (i.e., relative degree of randomness wrt to true state, length of encounter history, true encounter probability, etc). I think the 20% value Paul mentions is a reasonable rule of thumb.

I learned how to fit models in ESURGE on my own, through the help of it's documentation and through some helpful presentations posted on Remi (Choquet)'s website. There is also some more specific instructions in the online supplement to Conn & Cooch that will help you when you try to fit these partial observation models. That said, I have a fair amount of experience and intuition about these models; others may find a workshop useful.


Good point. If you have a fair understanding of the Markov models, transition matrices, and are a 'quick study' in programming, E-SURGE doesn't take too long to learn -- and there is a sub-section of this forum expressly for questions related to E-SURGE. There is a fair bit of documentation, but the learning curve is somewhat steeper than presented for MARK in the GI (this is not a criticism -- different 'curve', different target audiences...).

As far as fitting these in MARK, Bill Kendall, Gary White, and others are working on a new class of robust design models capable of mimicking hidden Markov models,


and extending them somewhat (omega parm)

but I'm not sure whether that these have been included in new releases of MARK, or the extent to which they would be willing to share their research (I'm not sure where it is in the publication process).


Rumor has it that said paper will be out soon. I suspect that they'll be included in MARK at some point.
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: State uncertainty in disease analysis

Postby jlaake » Fri Nov 04, 2011 11:54 am

MARK contains the models for unknown state with open and closed robust designs that Bill Kendall developed and Gary White implemented. I've also included those models in RMark. They can be a little difficult to get your mind around but provide the advantage of working in MARK or RMark. I've been using them with some data on weaning behavior in Ca sea lions.

--jeff
jlaake
 
Posts: 1480
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: State uncertainty in disease analysis

Postby simone77 » Mon Nov 07, 2011 9:13 am

Hi,

After reading your answers I have decided to try at least to explore E-Surge by myself, in the worst case I will not able to do that analysis but I trust it will not be physically painful...
Joking apart, it will be useful to me in any case.
I haven't a huge maths and statistics background but I am trying to fill this gap, for instance I have started a few months ago by studying transition matrices, Markov chains and so on.

Regarding to the rule of thumb of the 20% of unknown observations to decide if it is worth using E-Surge approach in these cases (for values higher than this there would be a real improvement of the precision of the estimates), my situation is much more "extreme" than that: I have a 80-85% (depending on each rabbits enclosure) of unknown observations. Any commentary on this?

Jeff Laake wrote:MARK contains the models for unknown state with open and closed robust designs that Bill Kendall developed and Gary White implemented. I've also included those models in RMark. They can be a little difficult to get your mind around but provide the advantage of working in MARK or RMark. I've been using them with some data on weaning behavior in Ca sea lions.


If I have understood which Kendall paper you are referring to, it would be about misclassification of states and this is not my case because I have individuals blood sampled (whose state is known) and individuals that are not blood sampled (unknown state). I had a look at the MARK help and found just the robust-design multi-state with mis-classification.

Thank you all for answering, as said by many people before me, this forum is really useful.

Simone
simone77
 
Posts: 200
Joined: Mon Aug 10, 2009 2:52 pm

Re: State uncertainty in disease analysis

Postby jlaake » Mon Nov 07, 2011 11:03 am

Bill has a new paper coming out in Ecology. Also don't be misled by "miss-classification" as it really is state uncertainty. In the original example it could be considered miss-classification because you have 2 states: A where something is present like a calf and B where it is not present. However, some B's are really A's where the calf was not seen. But in reality what you have is A, B, u where u is the uncertain state and you really never observe state B because you can't be certain that it didn't have a calf. The term miss-classification is misleading because it does not mean that if it is observed A then it might actually be B. You can't have that in the model. In your case you would have states A and B which you determine with the ones that you sample and the rest will be u (non-sampled) which are composed of both A and B individuals. Maybe Bill will chime in here but I think I'm correct here.

--jeff
jlaake
 
Posts: 1480
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: State uncertainty in disease analysis

Postby cooch » Mon Nov 07, 2011 11:28 am

jlaake wrote:Bill has a new paper coming out in Ecology. Also don't be misled by "miss-classification" as it really is state uncertainty. In the original example it could be considered miss-classification because you have 2 states: A where something is present like a calf and B where it is not present. However, some B's are really A's where the calf was not seen. But in reality what you have is A, B, u where u is the uncertain state and you really never observe state B because you can't be certain that it didn't have a calf. The term miss-classification is misleading because it does not mean that if it is observed A then it might actually be B. You can't have that in the model. In your case you would have states A and B which you determine with the ones that you sample and the rest will be u (non-sampled) which are composed of both A and B individuals. Maybe Bill will chime in here but I think I'm correct here.

--jeff


There is a lot of ambiguity, and some overlap, in definitions. Paul Conn and I have talked about this in various places, as has Bill. There is also a bunch of text related to this issue in the ponderous 'disease modeling estimation' paper that Paul, Ken Pollock, Steve Ellner, Andy Dobson and I worked up last year.

Basically, suppose a disease state (say, infected, not infected) is at least partially observable, with some non-zero probability. Then, given an individual is observed, it is either correctly assigned to a disease state, or not -- this is the misclassification question. That is, if it is assigned at all (and this is the key point). For disease systems where an individual is observed alive, but state is not observed, what do you do? This is the basis for Conn & Cooch (2009). You can either model it using HMM, or censor the data. Alternatively, even if state is observed, there is a probability of misclassification -- you assign it to a state, but get it wrong.
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: State uncertainty in disease analysis

Postby jlaake » Mon Nov 07, 2011 12:49 pm

If I understand correctly what is in MARK, it will handle state uncertainty but not misclassification in the sense that you say it is A and it is really B. What it does is to add a state u and either A or B may be recorded as u with some probability structure specified by Delta. There are also 2 other parameters of importance here. Omega is the proportion in each state (A and B) and pi is a nuiscance parameter that is the propotion in each state A or B of those that are first seen in state u. Takes a little thinking to get your mind around the difference between pi and Omega but Omega is the one that you'll be interested in for inference.

regards --jeff
jlaake
 
Posts: 1480
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: State uncertainty in disease analysis

Postby ganghis » Mon Nov 07, 2011 12:50 pm

Hi Simone,
In theory, the 80% figure shouldn't be a problem, but theory doesn't always apply nicely to real datasets. In practice, there may end up being some instability depending on how sparse the data are. In addition to the normal parameter ID diagnostics, I'd also be on the lookout for solutions that have 100% of the unknown states being assigned to one of the disease states (this may point to instability). It would also be worth running the model with alternative starting values to make sure you're getting the same set of estimates back.
-Paul
ganghis
 
Posts: 84
Joined: Tue Aug 10, 2004 2:05 pm

Re: State uncertainty in disease analysis

Postby simone77 » Tue Nov 08, 2011 6:00 pm

Thank you for your answers.
I guess the omega parameters Jeff Laake is talking about (implemented in RMark) is the same mentioned above by Ewan Cooch as a new MARK feature that will be detailed in the upcoming White, Kendall and others paper.

While trying to fix ideas and to start to work with E-Surge, I have been found a very nice paper about the Markov chains to study the wildlife disease dynamics (Zipkin et al 2010), that made me thinking about something very naive for anyone who is proficient with Markov chain (not my case): the link between time intervals length and state transitions parameters.

As outlined in the above paper, "The time step for the one-step transition probabilities should be defined so that it is only possible for one transition to occur during each time interval. If the time step is unreasonable for a given disease, then the metrics calculated (in this context something like expected duration in each state, life expectancy from the first observation and others) using Markov chain models may be incorrect (e.g. if the time step is too large and multiple transitions can occur, then expected duration in each state may be overestimated)".

Another point to take into account is the way uneven time intervals should be handled in the multi-state context. Cooch in a post in this forum said that: "Unequal intervals in a MS model are more problematic though, given the dimensionality of state uncertainty if (say) you miss an occasion. In fact, the preferred approach is to code the unequal interval as a missing occasion in the input file (using the 'dot' notation), and then proceeding from there."

I have quite large and uneven time intervals (i.e. 4.63, 3.33, 3.5, 4.17, 2.1, 3.27, 2.77 months).
In their case the multi-state model considered three states (infected, not infected, dead) whereas in my case I haven't the dead state that with such large time intervals had made things even worse. Even though I believe that during an interval might occur more than one transition.

Whereas I wouldn't know how to use dot notation in this case (no easy lowest common denominator among the intervals lengths), I am not sure on how the large time intervals could affect my analysis. Have you any suggestion on this?

Simone
simone77
 
Posts: 200
Joined: Mon Aug 10, 2009 2:52 pm

Next

Return to analysis help

Who is online

Users browsing this forum: No registered users and 0 guests