Fitting Hidden Markov Models (HidMarkov)

posts related to the RMark library, which may not be of general interest to users of 'classic' MARK

Fitting Hidden Markov Models (HidMarkov)

Postby chris_oosthuizen » Sun Sep 12, 2021 3:00 pm

I am currently involved with a project on albatross capture-recapture modelling. These birds often only breed every second year and CMR models thus requires one or more unobservable states to be specified. Additionally, there may be some uncertainty in field observations (e.g., whether an individual was a failed or successful breeder may not be known). Multistate, or multistate hidden markov models are thus required. I am familiar with such models in ESurge, but I now want to learn how to fit these in RMark. For this, I have taken a step-wise approach:

1) Create a very simple, fake data set
2) Fit a multistate model (states: successful breeder, failed breeder)
3) Fit a multistate model with 1 unobservable state* (states: successful breeder, failed breeder, non-breeder*)
4) Fit a multistate model with 2 unobservable states* (states: successful breeder, failed breeder, post-successful-nonbreeder*, post-failed-nonbreeder*)
5) Fit a multistate HMM model with 2 unobservable states* and uncertainty in state assignment (states: successful breeder, failed breeder, post-successful-nonbreeder*, post-failed-nonbreeder*)

I provide basic R code for these models below, should this be useful to anyone. When fitting models 2, 3 and 4 I obtain the same results as in ESurge. I have difficulty to fit the HMM model in Rmark.

Some questions:

1) What should be labelled as an 'event' in RMark code? I think it is only the number (or letter) that represents uncertain observations in the encounter history matrix.

2) I have some difficulty to understand the 'pi' parameter as implemented in Mark. According to the Mark help file, "The pi parameter is the probability of the EVENT on the first capture." In contrast, in ESurge, pi is "‘initial state probabilities’, which describe the probability that an individual is in one or another state when it is first encountered". So, in ESurge, I would set pi to zero for the post-successful-nonbreeder and post-failed-nonbreeder states of my model, because these are unobservable states, and an animal cannot enter the population in this way. In Mark, I understand that I will not do this?

3) In Mark, the Delta parameters are the probability of determining the state given detection. I am not sure how to create an indicator variables for Delta in the ddl (design data list). For example, in my example (below), I have a single uncertain event (coded '5') which may refer to successful or failed breeders (though with different probabilities for each state). I'd be very grateful for any hints.

Any specific or general advice on fitting HMMs using RMark will be much appreciated!
Chris

Code: Select all

#---------------------------
# Make some multistate data:
# 1 = succesful breeder
# 2 = failed breeder
# 0 = not seen
#---------------------------

df = c('1010001', '1010000', '1010000', '1010000', '1010100',
'1000000', '1000001', '1000001', '1000001', '1010101', '1010101', '1010101',
'1010101', '1010101', '1010101', '1010101', '1010101', '1010001', '1000001',
'1000001', '1000001', '1010101', '1010101', '1010101', '1010101', '1010101',
'1010101', '1010201', '1010201', '1010201', '1010201', '1010201', '1010201',
'1010201', '1010201', '1010201', '1010201', '1010201', '1010201', '1010201',
'1010201', '1010201', '1010201', '2010201', '2010000', '2000000', '2000000',
'2000000', '2000000', '2000000', '2010220', '2010221', '2010221', '2010221',
'2010221', '2010221', '2010221', '2010221', '2010221', '2010221', '2012221',
'2012222', '2012222', '2012222', '2012221', '2012221', '2012020', '2022010',
'2022010', '2022010', '2022010', '2022010', '2022010', '2022010', '2022010',
'2022010', '2021010', '2021010', '2021010', '2021010', '2021010', '2021010',
'2001000', '2021010', '2021010', '2021010')

df=as.data.frame(df); names(df) = "ch"; df$freq = 1; df$ch = as.character(df$ch)
head(df)

#------------------------------------------------------
# Fit multi-state model
# 3 states (successful breeder / failed breeder / dead)
# 3 events (successful breeder / failed breeder / not seen) 
#------------------------------------------------------

library(RMark)

dp = process.data(df, model = "Multistrata") 

## create design data
ddl = make.design.data(dp)

# define model to fit
Phi.ct = list(formula=~1)
p.ct = list(formula=~1)
Psi.class = list(formula=~stratum)
 
# constant survival, constant recapture
Model.1 = mark(dp, ddl, model.parameters=list(S = Phi.ct,
                                              p = p.ct,
                                              Psi = Psi.class),
                   output = FALSE,delete=T, model ="Multistrata")

summary(Model.1)   # parameters matches E-surge output


#------------------------------------------------------
# Fit multi-state model with 1 unobservable state
# 4 states (successful breeder / failed breeder / nonbreeder(= unobservable) / dead)
# 3 events (successful breeder / failed breeder / not seen)
#------------------------------------------------------

# state 3 = unobservable
dp = process.data(df, model = "Multistrata", strata.labels = c("1", "2", "3")) 

## create design data
ddl = make.design.data(dp)

# Set p = 0 for state 3 because non-breeders cannot be observed
ddl$p$fix[ddl$p$stratum=="3"]=0

# define model to fit
Phi.ct = list(formula=~1)
p.ct = list(formula=~1)
Psi.class =list(formula =~ -1+stratum:tostratum)   # see page 918 C.17. Multi-strata example in Mark book

table(ddl$Psi[,c("stratum","tostratum")])

# constant survival, constant recapture
Model.2 = mark(dp, ddl, model.parameters=list(S = Phi.ct,
                                              p = p.ct,
                                              Psi = Psi.class),
                   output = FALSE,delete=T, model ="Multistrata")

summary(Model.2)   # matches E-surge output


#----------------------------------------------------------------------------
# Fit multi-state model with 2 unobservable states
# 5 states (successful breeder / failed breeder / post-successful (unobservable) / post-failed (unobservable) / dead)
# 3 events (seen as successful breeder / failed breeder / not seen)
#----------------------------------------------------------------------------

# state 3 & 4 = unobservable
dp = process.data(df, model = "Multistrata", strata.labels = c("1", "2", "3", "4")) 

## create design data
# Set the transition that is computed by subtraction with the subtract.stratum argument
# Important! The ψ that you want to set so zero cannot be computed by
# subtraction, so you need to set the subtract.stratum appropriately.
# See page 919 in the Mark book
# Here I compute the same states as for GEPAT in ESurge:
ddl = make.design.data(dp, parameters=
list(Psi=list(subtract.stratum=c("3","4","3","4"))))

# Set movement parameters (PSI) to 0 (cannot move from successful to post-failed, for example)
ddl$Psi$fix[ddl$Psi$stratum=="1" & ddl$Psi$tostratum=="4"]=0
ddl$Psi$fix[ddl$Psi$stratum=="2" & ddl$Psi$tostratum=="3"]=0

ddl$Psi$fix[ddl$Psi$stratum=="3" & ddl$Psi$tostratum=="4"]=0
ddl$Psi$fix[ddl$Psi$stratum=="4" & ddl$Psi$tostratum=="3"]=0

ddl$p$fix[ddl$p$stratum=="3"]=0
ddl$p$fix[ddl$p$stratum=="4"]=0

# define model to fit
Phi.ct = list(formula=~1)
p.ct = list(formula=~1)
Psi.class =list(formula =~ -1+stratum:tostratum)   # see page 918 C.17. Multi-strata example in Mark book

table(ddl$Psi[,c("stratum","tostratum")])

# constant survival, constant recapture
Model.3 = mark(dp, ddl, model.parameters=list(S = Phi.ct,
                                              p = p.ct,
                                              Psi = Psi.class),
                   output = FALSE,delete=T, model ="Multistrata")

# match E-surge output!
summary(Model.3)

Psilist=get.real(Model.3,"Psi",vcv=T)
Psivalues=Psilist$estimates
TransitionMatrix(Psivalues[Psivalues$time==1,],vcv.real=Psilist$vcv.real)


# ----------------------------------------------------------------------------------
# Fit HMM model with 2 unobservable states
# 5 states (successful breeder / failed breeder / post-successful (unobservable) / post-failed (unobservable) / dead)
# 4 events (seen as successful breeder / failed breeder / not seen / seen with uncertain breeding state)
# event 1 = Successful breeder
# event 2 = Failed breeder
# event 5 = uncertain breeder

# Warning: this analysis is not correct and should not be used as a template
#-----------------------------------------------------------------------------------

# Add uncertainty (event = '5') to the input data
df = c('1010001',
'1050000', '1010000', '1010000', '1010100', '1000000', '1000001', '1000001',
'1000001', '1010105', '1050101', '1010101', '1050101', '1010101', '1010101',
'1010101', '1010101', '1010001', '1000001', '1000001', '1000001', '1010101',
'1010101', '1010101', '1050101', '1010101', '1010101', '1010201', '1010201',
'1010201', '1010201', '1010201', '1010201', '1010201', '1010201', '1010201',
'1010201', '1010201', '1010201', '1010201', '1010201', '1050201', '1010205',
'2010201', '2010000', '2000000', '2000000', '2000000', '2000000', '2000000',
'2050220', '2010221', '2010221', '2050221', '2010221', '2010221', '2010221',
'2010221', '2010521', '2010251', '2012221', '2012222', '2012222', '2012222',
'2012251', '5012221', '2012020', '2022010', '2025010', '2022010', '2022010',
'2052010', '2022010', '2022010', '2022010', '2022010', '2021010', '2021010',
'2021010', '2051010', '2051010', '5021010', '2001000', '2025010', '2021010',
'2021010')

df=as.data.frame(df); names(df) = "ch"; df$freq = 1; df$ch = as.character(df$ch)
head(df)

# state 3 & 4 = unobservable
# seems like 'event' is only for the uncertain observation - not the 'state' observations!
# Error in process.data: unused argument (events) if package marked is loaded in session
dp = process.data(df, model = "HidMarkov", strata.labels = c("1", "2", "3", "4"),
                  events = c("5")) 

## create design data
ddl = make.design.data(dp)

# Set movement parameters (PSI) to 0
ddl$Psi$fix[ddl$Psi$stratum=="1" & ddl$Psi$tostratum=="4"]=0
ddl$Psi$fix[ddl$Psi$stratum=="2" & ddl$Psi$tostratum=="3"]=0

ddl$Psi$fix[ddl$Psi$stratum=="3" & ddl$Psi$tostratum=="4"]=0
ddl$Psi$fix[ddl$Psi$stratum=="4" & ddl$Psi$tostratum=="3"]=0

ddl$p$fix[ddl$p$stratum=="3"]=0
ddl$p$fix[ddl$p$stratum=="4"]=0

# animals can only enter as breeders, not as post-breeders (unobservable)

# In HMMs, the pi parameter is the probability of the event on the first capture.
# Note: P(event)! not as in ESurge, where it is P(state on first capture)! (?)
# Should this be fixed to zero?
ddl$pi$fix[ddl$pi$stratum=="3"]=0
ddl$pi$fix[ddl$pi$stratum=="4"]=0

# The Delta parameters are the probability of determining the state given detection
# with probability p.
ddl$Delta
ddl$Delta$fix[ddl$Delta$stratum=="1"]=1
#ddl$Delta$fix[ddl$Delta$stratum=="2"]=1

# define model to fit
Phi.ct = list(formula=~1)
p.ct = list(formula=~1)
Psi.class =list(formula =~ -1+stratum:tostratum)   # see page 918 C.17. Multi-strata example in Mark book

pi.dot = list(formula=~1) 
Delta.class  = list(formula=~stratum)
#The delta parameters are the probability of determining the state given detection
# with probability p.
# Delta model
# create indicator variables for
#ddl$Delta$fix = ifelse(ddl$Delta$p =="5", 1, 0)

table(ddl$Psi[,c("stratum","tostratum")])
table(ddl$pi[,c("stratum")])
table(ddl$Delta[,c("event")])


# Aim: run a HMM where uncertain events (=5) could be successful breeders or unsuccesful breeders

Model.4 = mark(dp, ddl, model.parameters=list(S = Phi.ct,
                                              p = p.ct,
                                              Psi = Psi.class,
                                              pi = pi.dot,
                                              Delta = Delta.class),
                   output = FALSE,delete=T, model ="HidMarkov", mlogit0 = T) 

summary(Model.4)


chris_oosthuizen
 
Posts: 10
Joined: Wed Sep 12, 2012 10:37 am

Re: Fitting Hidden Markov Models (HidMarkov)

Postby jlaake » Mon Sep 13, 2021 11:56 am

I need to add an example for this as I can see it can be confusing. Maybe I can use yours once you get it running.

1) events are observable values in ch that are not states/strata. Note that the "error" from process.data is not an error. Just a note to let you know that more than one of the states are not observed in ch.

2) Here is an example of the definition of pi from the help file in MARK. Except I changed to your case and embellished some

pi(1|5) -- Probability of being state in 1 given observed as event 5 (for initial capture)
pi(2|5) -- Probability of being state in 2 given observed as event 5
pi(3|5) -- Probability of being state in 3 given observed as event 5
pi(4|5) -- Probability of being state in 4 given observed as event 5
with one of these obtained by subtraction.

The last one is dropped from ddl which is your point of confusion possibly. In your case you can fix pi(1|5) and pi(2|5) to 0 and then you can estimate pi(4|5) and p(5|5) will be estimated as 1-pi(4|5).

Something like the following is supposed to work but doesn't seem to so I'll look into that. I have a subtract.events that works with Delta but I need to add to pi as well.
ddl = make.design.data(dp,parameters=list(pi=list(subtract.stratum="1")))

3) delta is the probability of being classified as an event/state to an observation in each state. By default, the probability of classifying the state correctly is computed by subtraction. In your case, presumably
delta(1|5)=delta(2|5)=0 and so delta(1|1)=delta(2|2)=1. So you can fix the latter to 0 in the design data for those strata. Then you can let delta(3|5) and delta(4|5) be estimated. Not sure what you mean by assigning an indicator variable. I'm guessing that it is between 3 and 4? Something like ddl$Delta$indicator=ifelse(ddl$Delta$stratum=="4",1,0) would work. The intercept would be for 3 and the difference between 3 and 4 would be the value for indicator in your formula. This all presumes you are fixing Delta to 0 for states 1 and 2.
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: Fitting Hidden Markov Models (HidMarkov)

Postby chris_oosthuizen » Wed Sep 15, 2021 12:40 pm

Jeff, this was very useful, thank you. Thanks also to Bill Kendall who gave me input offline. I provide feedback here should this be useful to others. The script below fits a HMM in RMark. The results are the same as from (direct fitting in) MARK and very close to those from ESurge.

With regards to pi: In my example I would like to estimate pi(1|5) and pi(2|5) (entry as successful or failed breeder, seen with uncertain event). MARK defaults to deriving the probability of the last state listed by subtraction. The latter was at first a problem for me (see Jeff's comment that subtract.events that works with Delta does not currently work with pi), since I'd like to fix pi for states 3 and 4 to 0. The solution is simply to list states 1 or 2 last when you specify the states in "process.data" (see R code below).

The (made-up) encounter history provided below is perhaps not the best, but it shows correct (basic) code for this model type.

Code: Select all

# ----------------------------------------------------------------------------------
# Fit HMM model with 2 unobservable states
# 5 states:
   # 1 = successful breeder
   # 2 = failed breeder
   # 3 = post-successful breeder (unobservable)
   # 4 = post-failed breeder (unobservable)
   # 5 = dead

# 4 field observations:
   # 0 = not seen
   # 1 = seen as successful breeder
   # 2 = seen as failed breeder
   # 5 = seen with uncertain breeding state (** this is an event in MARK terms)

#-----------------------------------------------------------------------------------

library(RMark)

# Add a few uncertainty events (event = '5') to the above input data
df = c('1010001',
'1050000', '1010000', '1010000', '1010100', '1000000', '1000001', '1000001',
'1000001', '1010105', '1050101', '1010101', '1050101', '1010101', '1010101',
'1010101', '1010101', '1010001', '1000001', '1000001', '1000001', '1010101',
'1010101', '1010101', '1050101', '1010101', '1010101', '1010201', '1010201',
'1010201', '1010201', '1010201', '1010201', '1010201', '1010201', '1010201',
'1010201', '1010201', '1010201', '1010201', '1010201', '1050201', '1010205',
'2010201', '2010000', '2000000', '2000000', '2000000', '2000000', '2000000',
'2050220', '2010221', '2010221', '2050221', '2010221', '2010221', '2010221',
'2010221', '2010521', '2010251', '2012221', '2012222', '2012222', '2012222',
'2012251', '5012221', '2012020', '2022010', '2025010', '2022010', '2022010',
'2052010', '2022010', '2022010', '2022010', '2022010', '2021010', '2021010',
'2021010', '2051010', '2051010', '5021010', '2001000', '2025010', '2021010',
'2021010')

df=as.data.frame(df); names(df) = "ch"; df$freq = 1; df$ch = as.character(df$ch)
head(df)

# 'event' is only for the uncertain observation - not the 'state' observations!
# Error in process.data: unused argument (events) if package marked is loaded in session
# states 3 & 4 = unobservable

# Bill Kendall: For pi, MARK defaults to deriving the probability of the last state listed by subtraction.
# Since we like to fix pi for states 3 and 4 to 0, the solution is to list states 1 or 2
# last when you specify the states.

dp = process.data(df, model = "HidMarkov", strata.labels = c("3", "4","1", "2"),
                  events = c("5")) 

## create design data
ddl = make.design.data(dp)

# Set movement parameters (PSI) to 0
ddl$Psi$fix[ddl$Psi$stratum=="1" & ddl$Psi$tostratum=="4"]=0
ddl$Psi$fix[ddl$Psi$stratum=="2" & ddl$Psi$tostratum=="3"]=0

ddl$Psi$fix[ddl$Psi$stratum=="3" & ddl$Psi$tostratum=="4"]=0
ddl$Psi$fix[ddl$Psi$stratum=="4" & ddl$Psi$tostratum=="3"]=0

# Set unobservable p to 0
ddl$p$fix[ddl$p$stratum=="3"]=0
ddl$p$fix[ddl$p$stratum=="4"]=0

# animals can only enter as breeders, not as post-breeders (unobservable)
# In HMMs, the pi parameter is the probability of the event on the first capture.
# Note: P(event)! and not as in ESurge, where it is P(state on first capture)!

# Set unobservable pi to 0
ddl$pi$fix[ddl$pi$stratum=="3"]=0
ddl$pi$fix[ddl$pi$stratum=="4"]=0

# The Delta parameters are the probability of determining the state given detection
# with probability p (MARK defaults to obtaining the probability of correctly assigning
# the state by subtraction)
ddl$Delta$fix[ddl$Delta$stratum=="3"]=0
ddl$Delta$fix[ddl$Delta$stratum=="4"]=0


# define a model to fit
# make a survival class (successful = post-successful != failed = post-failed)
ddl$S$class = ifelse(ddl$S$`3`=="1" |ddl$S$`1`=="1" , 1, 0)

Phi.ct = list(formula=~class)
p.ct = list(formula=~stratum)
Psi.class =list(formula =~ -1+stratum:tostratum)   # see page 918 C.17. Multi-strata example in Mark book
pi.ct = list(formula=~stratum) 
Delta.class  = list(formula=~stratum)

table(ddl$Psi[,c("stratum","tostratum")])
table(ddl$pi[,c("stratum")])
table(ddl$Delta[,c("event")])


# Aim: run a HMM where uncertain events (=5) could be successful breeders or unsuccesful breeders

Model.4 = mark(dp, ddl, model.parameters=list(S = Phi.ct,
                                              p = p.ct,
                                              Psi = Psi.class,
                                              pi = pi.ct,
                                              Delta = Delta.class),
                   output = FALSE,delete=T, model ="HidMarkov", mlogit0 = T) 

summary(Model.4)


Psilist=get.real(Model.4,"Psi",vcv=T)
Psivalues=Psilist$estimates
TransitionMatrix(Psivalues[Psivalues$time==1,],vcv.real=Psilist$vcv.real)


chris_oosthuizen
 
Posts: 10
Joined: Wed Sep 12, 2012 10:37 am

Re: Fitting Hidden Markov Models (HidMarkov)

Postby jlaake » Wed Sep 15, 2021 2:51 pm

Glad you got this all setup. If it is okay with you I'll add as an example to the next release of RMark. The next release will fix the problem with subtract.stratum for pi.
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: Fitting Hidden Markov Models (HidMarkov)

Postby chris_oosthuizen » Fri Sep 17, 2021 3:18 pm

You are very welcome to use this as an example Jeff. I had a look at another example dataset (real this time) from Sooty shearwaters ('titis2.txt', available on Olivier Gimenez's Github page (https://github.com/oliviergimenez/multievent_jags_R). He fitted HMM models using Nimble, JAGS, R and ESurge. Below I fit the same model in RMark (*note that pi is defined differently, as mentioned earlier).

I could not get subtract.events to work with Delta? I tried:
Code: Select all
ddl = make.design.data(dp, parameters=
        list(Delta=list(subtract.events=c("1","2"))))
   #    list(Psi=list(subtract.stratum=c("1","2"))))

ddl


If I run the Psi line of code, it updates the ddl to what numbers I list,
Code: Select all
 $pimtypes$Psi$subtract.stratum
[1] "1" "2"


but the ddl does not seem to change with
Code: Select all
 list(Delta=list(subtract.events=c("1","2"))))


In the example below, I thus report (1 - D) to compare to Olivier's results.

Code: Select all

library(RMark)

# Get titis2.txt. file from
# https://github.com/oliviergimenez/multievent_jags_R
# Fitting multievent models in R, Nimble and JAGS
# by Olivier Gimenez

df <- read.table('titis2.txt')

library(tidyverse)

df = df %>%
      unite(ch, c("V1","V2","V3","V4","V5","V6","V7"), sep = "")

names(df) = "ch"; df$freq = 1; df$ch = as.character(df$ch)
head(df)

# OBSERVATIONS
# 0 = non-detected
# 1 = seen and ascertained as non-breeder
# 2 = seen and ascertained as breeder
# 3 = not ascertained
 
# STATES
# 1 = alive non-breeder
# 2 = alive breeder
# 3 = dead


# process data. events are only for the uncertain observation - not the 'state' observations
dp = process.data(df, model = "HidMarkov", strata.labels = c("1", "2"),
                                           events = c("3")) 

## create design data
ddl = make.design.data(dp, parameters=
       list(Delta=list(subtract.events=c("1","2"))))   # subtract.events does not seem to have an effect in ddl?
   #    list(Psi=list(subtract.stratum=c("1","2"))))

ddl

# define a model to fit
Phi.c = list(formula=~stratum)
p.c =   list(formula=~stratum)
Psi.c = list(formula =~ stratum)
pi.c =  list(formula=~1) 
Delta.c = list(formula=~stratum)

Model.HMM = mark(dp, ddl, model.parameters=list(S = Phi.c,
                                              p = p.c,
                                              Psi = Psi.c,
                                              pi = pi.c,
                                              Delta = Delta.c),
                   output = FALSE, delete=T, model ="HidMarkov", mlogit = T) 

summary(Model.HMM)

# Comparison of results:


#             O Gimenez           O Gimenez
#               Esurge           R (max likelihood)    Rmark
# S(1,1)     0.81369        > phiNB     0.814038      0.81404
# S(1,2)     0.83782        > phiB      0.8374951     0.83749
# T(2,1)     0.23133        > psiB-NB   0.226479      0.22647
# T(1,2)     0.21493        > psiNB-B   0.2194115     0.21940
# p(1)       0.56736        > pNB       0.5646503     0.56464
# p(2)       0.59451        > pB        0.5977104     0.59772
# D(2,2)     0.18049        > deltaNB   0.1878003     0.18780* (= 1 - D)
# D(3,3)     0.74591        > deltaB    0.7375783     0.73758* (= 1 - D)


chris_oosthuizen
 
Posts: 10
Joined: Wed Sep 12, 2012 10:37 am


Return to RMark

Who is online

Users browsing this forum: No registered users and 1 guest

cron