## mlogit formula specification

questions concerning analysis/theory using the R package 'marked'

### mlogit formula specification

I got the following question from a user about formula for Psi in a multistate model.

I am trying to fit CJS multistate models on a c-r turtle dataset, but a bit confused about model specification regarding transition probabilities (Psi).
Could you please elucidate me about what the specifications below are actually modelling? (not sure whether the first two make sense)

Code: Select all
`Psi=list(formula=-1+stratum)Psi=list(formula=~-1+tostratum)Psi=list(formula=-1+stratum:tostratum)`

I'm posting my answer below which may be useful for others:

Code: Select all
`Psi=list(formula=-1+stratum)`

The -1 here is not necessary. It simply removes the intercept and replaces with S values where S is the number of strata. ~stratum specifies that movement is the same across tostratum from each stratum but can differ for stratum. All that matters is where you are at but not where you are going.

Code: Select all
`Psi=list(formula=~-1+tostratum)`

The -1 here is also not necessary. It simply removes the intercept and replaces with S values where S is the number of strata. ~tostratum specifies that movement is the same across stratum from each tostratum value but can differ for tostratum. All that matters is where you are going and not where you are at.

Code: Select all
`Psi=list(formula=-1+stratum:tostratum)`

Here the -1 is necessary. This formula allows different values depending on where you are at and where you are going and typically makes the most sense.

There is an important difference between the way marked and RMark work with regard to mlogit parameters like Psi. An mlogit parameter is one in which the sum of the probabilities is 1. For Psi, if I'm in stratum A and can go to B or C or remain in A, the probabilities A to A, A to B and A to C must sum to 1 because that is all of the possibilities. In RMark/MARK the design data would only contain 2 records which are determined based on what you select as subtract.stratum. If subtract.stratum was set as A for the A stratum, the design data for Psi would only contain records for A to B and A to C. The value for A to A would be computed by subtraction.

In marked, all 3 records are in the design data and the default of staying in A (A to A) has a value of fix=1 which makes it computed by subtraction. I did this for 2 reasons. Firstly, that way you get a real parameter estimate for the subtracted stratum which you don't get in RMark/MARK. Secondly, you can change the value to be subtracted at will and it is not fixed across the entire model fit, but you do have to be careful when specifying the model when you do that because the formula specifies the parameters for those that are not fixed.
jlaake

Posts: 1361
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

### Re: mlogit formula specification

Hi,
Thanks again Jeff for the very informative explanation.

So if I understand well each model could be interpreted as follows:
1. Psi=list(formula=stratum)
States that the probability of remaining in state i versus changing from state i to any other state, varies across ‘strata’
2. Psi=list(formula=tostratum)
States that the probability of remaining in state i versus reaching state i independently of the state before, varies across ‘tostrata’
3. Psi=list(formula=-1+stratum:tostratum)
States variable probabilities of transition among specific states

Is this correct?

For a dataset I’m exploring, predicted values for a model including Psi=list(formula=stratum) seem to match well the interpretation above:

stratum tostratum occ reals estimate se lcl ucl
1 A A 7 0.727 0.727 0.148 0.383 0.919
2 A Q 7 0.034 0.034 0.018 0.012 0.096
3 A M 7 0.034 0.034 0.018 0.012 0.096
4 A K 7 0.034 0.034 0.018 0.012 0.096
5 A L 7 0.034 0.034 0.018 0.012 0.096
6 A F 7 0.034 0.034 0.018 0.012 0.096
7 A P 7 0.034 0.034 0.018 0.012 0.096
8 A D 7 0.034 0.034 0.018 0.012 0.096
9 A C 7 0.034 0.034 0.018 0.012 0.096
10 Q A 7 0.059 0.059 0.019 0.032 0.108
11 Q Q 7 0.527 0.527 0.149 0.257 0.782
12 Q M 7 0.059 0.059 0.019 0.032 0.108
13 Q K 7 0.059 0.059 0.019 0.032 0.108
14 Q L 7 0.059 0.059 0.019 0.032 0.108
15 Q F 7 0.059 0.059 0.019 0.032 0.108
16 Q P 7 0.059 0.059 0.019 0.032 0.108
17 Q D 7 0.059 0.059 0.019 0.032 0.108
18 Q C 7 0.059 0.059 0.019 0.032 0.108
(...)

Where, for instance, if I understood well, the probability of remaining in state A versus changing from A to any other state would be respectively 0.727 and 0.272 (8*0.034)

However, in a model including Psi=list(formula=tostratum), according the interpretation above, I would expect predicted outputs as below:

stratum tostratum occ reals estimate Se lcl ucl
1 A A value1
2 Q A value2
3 M A value2
4 K A value2
5 L A value2
6 F A value2
7 P A value2
8 D A value2
9 C A value2
10 A Q value3
11 Q Q value4
12 M Q value4
13 K Q value4
14 L Q value4
15 F Q value4
16 P Q value4
17 D Q value4
18 C Q value4
(...)

Where for instance the probability of remaining in A versus reaching A from any other previous state would be respectively value1 and 8*value2. But, probably I ‘am not interpreting this correctly, as I got the following predictive outputs using Psi=list(formula=tostratum):

stratum tostratum occ reals estimate se lcl ucl
1 A A 7 0.634 0.634 0.068 0.493 0.755
2 A Q 7 0.032 0.032 0.023 0.008 0.123
3 A M 7 0.160 0.160 0.050 0.084 0.283
4 A K 7 0.057 0.057 0.029 0.021 0.146
5 A L 7 0.015 0.015 0.015 0.002 0.100
6 A F 7 0.000 0.000 0.000 0.000 1.000
7 A P 7 0.071 0.071 0.036 0.026 0.180
8 A D 7 0.031 0.031 0.022 0.008 0.120
9 A C 7 0.000 0.000 0.000 0.000 1.000
10 Q A 7 0.000 0.000 0.000 0.000 0.000
11 Q Q 7 0.655 0.655 0.069 0.512 0.775
12 Q M 7 0.165 0.165 0.052 0.087 0.291
13 Q K 7 0.059 0.059 0.029 0.021 0.151
14 Q L 7 0.015 0.015 0.015 0.002 0.103
15 Q F 7 0.000 0.000 0.000 0.000 1.000
16 Q P 7 0.073 0.073 0.037 0.027 0.186
17 Q D 7 0.032 0.032 0.023 0.008 0.124
18 Q C 7 0.000 0.000 0.000 0.000 1.000

Which suggests specific probabilities of transition among states, as would be expected from a model specifying Psi=list(formula=stratum:tostratum), so now I’m a bit confused...

Could you please provide me any clue on what I might be missing?

Would a model including Psi=(formula=stratum+tostratum) provide the probabilities of remaining in state i versus reaching any other different state from i versus reaching state i from any other state?

I know in multisate models the most informative approach would probably include Psi=(formula=-1+stratum:tostratum), but I guess my dataset includes too many possible states, likely leading overparametrerization problems, so I’m exploring other alternatives for approaching my main research goals.

I’m sorry if these are silly questions (I’m new in using both multistate CR models and marked), and hope you can provide any clue on transition model formulation/interpretation.

Thanks a lot
Ricardo
ricardo

Posts: 8
Joined: Wed Aug 10, 2016 2:37 pm

### Re: mlogit formula specification

Sorry for my slow response. I was on leave and then forgot about this message. I should have been more clear that my interpretations were in regard to the link scale rather than the real scale. In regards to the ~tostratum example let me create a simple example

Code: Select all
`From      To       log value A            A           0 A            B            b0 A            C            b1 B             B            0 B             A           b2 B             C           b1 C             C            0 C             A            b2 C             B            b0`

Now when it computes the Psi values it exponentiates each quantity (exp(0)=1 which is why fix=1 for the subtracted stratum) and then divides by the sum of the quantities within each stratum. That is the definition of an mlogit parameter. But as you can see what is summed differs for each stratum resulting in different reals.

For A Psi A to B is exp(b0)/(1+exp(b0)+exp(b1)) where as Psi C to B is exp(b0)/(1+exp(b0)+exp(b2)) which will yield a different real value. Thus maybe not what you expected but that is the model. Interpretation is a bit strange so may not be useful.

Now what you can do is to fix the subtracted stratum to be the same across strata to solve this problem.

Code: Select all
`From      To       log value A            A           0 A            B            b0 A            C            b1 B             B            b0 B             A           0 B             C           b1 C             C            b1 C             A            0 C             B            b0`

Now the sums will all be the same and reals will be the same going to each stratum. To do this you need to set fix=1 for A to A, B to A and C to A rather than using the default which is A to A, B to B, and C to C.

Hopefully this is now clear. --jeff
jlaake

Posts: 1361
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA