Explicit grouping variables when extracting real values

posts related to the RMark library, which may not be of general interest to users of 'classic' MARK

Explicit grouping variables when extracting real values

Postby Henrik-P » Tue Apr 23, 2024 10:21 am

I fit a model using RMark::mark with some grouping variables. When I extract the real Phi parameters from the model object, using
Code: Select all
the-name-of-the-model$results$real
, the values of the grouping variables are given as concatenated row names. Although I can use some string processing to extract the values of the groups, I wonder if there is a simpler way.

For comparison, when extracting the real parameters from a model fitted using marked::crm, the grouping variables are given in separate columns, as desired. However, now only the estimates are shown, but not se, lcl and ucl.

Here's a small example using the dipper data. Models are fitted with both RMark and marked to obtain sex- and time-specific estimates.

Create processed data and design data:
Code: Select all
library(RMark)
data(dipper)

# process data
proc_rmark = RMark::process.data(data = dipper, groups = ("sex"))
proc_marked = marked::process.data(data = dipper)

# create design data
design_rmark = RMark::make.design.data(proc_rmark)
design_marked = marked::make.design.data(proc_marked)



Fit models:
Code: Select all
m_rmark = RMark::mark(
    data = proc_rmark,
    ddl = design_rmark,
    model.parameters = list(
        Phi = list(formula = ~sex + time),
        p = list(formula = ~1)))

m_marked = marked::crm(
    data = proc_marked,
    ddl = design_marked,
    model.parameters = list(
        Phi = list(formula = ~sex + time),
        p = list(formula = ~1)),
    accumulate = FALSE)

m_marked = marked::cjs.hessian(m_marked)


Get real values from models:
RMark:
Code: Select all
m_rmark$results$real
                estimate              se            lcl            ucl
Phi gFemale c1 a0 t1 0.6184423 0.1154337 0.3832299 0.8087247
Phi gFemale c1 a1 t2 0.4480347 0.0699014 0.3180764 0.5855008
Phi gFemale c1 a2 t3 0.4717226 0.0628891 0.3525469 0.5942113
Phi gFemale c1 a3 t4 0.6179090 0.0617793 0.4919537 0.7297892
Phi gFemale c1 a4 t5 0.6010942 0.0601805 0.4795367 0.7113504
Phi gFemale c1 a5 t6 0.5762779 0.0624635 0.4516836 0.6918740
Phi gMale c1 a0 t1   0.6317731 0.1128463 0.3986916 0.8161656
Phi gMale c1 a1 t2   0.4621412 0.0724222 0.3267790 0.6033250
Phi gMale c1 a2 t3   0.4859184 0.0642372 0.3634640 0.6100871
Phi gMale c1 a3 t4   0.6312473 0.0615073 0.5049171 0.7418247
Phi gMale c1 a4 t5   0.6146534 0.0593749 0.4938954 0.7227713
Phi gMale c1 a5 t6   0.5901054 0.0617602 0.4660395 0.7036738
p gFemale c1 a1 t2   0.9021693 0.0290292 0.8287662 0.9461511


marked:
Code: Select all
m_marked$results$real$Phi
sex time occ  estimate
1  Female    6   6 0.5762781
2    Male    6   6 0.5901055
3  Female    5   5 0.6010941
4    Male    5   5 0.6146532
5  Female    4   4 0.6179089
6    Male    4   4 0.6312471
7  Female    3   3 0.4717224
8    Male    3   3 0.4859181
9  Female    2   2 0.4480346
10   Male    2   2 0.4621411
11 Female    1   1 0.6184402
12   Male    1   1 0.6317710



Using RMark
Pros: values of estimates are given with se, lcl and ucl.
Cons: values of the grouping variables (sex and time) are given as concatenated string in rownames.

Using marked
Pros: values of grouping variables (sex and time) are given in separate columns.
Cons: only estimates, no se, lcl and ucl

What would be the canonical way to get estimates of real parameters with se, lcl and ucl, and the grouping variables as explicit columns? Although the main part of my work is performed in RMark, I'm happy to see solutions for both RMark and marked.

Thank you for your great work!
Last edited by Henrik-P on Wed Apr 24, 2024 4:25 am, edited 2 times in total.
Henrik-P
 
Posts: 3
Joined: Tue Apr 23, 2024 6:55 am

Re: Explicit grouping variables when extracting real values

Postby jlaake » Tue Apr 23, 2024 5:18 pm

Unless you want to use a model in marked that is not in MARK, then I highly suggest you use RMark/MARK. The latter is much more developed and complete. The marked package was NEVER intended to replace or compete with MARK which has far more capabilities than marked. My intention for marked was proof of concept to provide open source to show how these models can be developed particularly within the context of hidden markov models. If you look at the code for marked you will see its evolution from using FORTRAN with optim to ADMB and then to TMB with the focus on hidden markov model approach.

To address your questions, with RMark you can bind the design data for a parameter with the reals for that parameter. You can limit the design data columns you want to give you whatever data you want. This will give you all of the estimates and not the unique ones

Code: Select all
cbind(design_rmark$Phi,summary( m_rmark,se=TRUE)$reals$Phi)


m_rmark$results$real gives you the unique ones and you can split off values with

Code: Select all
t(sapply(strsplit(rownames(x)," "),function(z)z))
      [,1]  [,2]      [,3] [,4] [,5]
 [1,] "Phi" "gFemale" "c1" "a0" "t1"
 [2,] "Phi" "gFemale" "c1" "a1" "t2"
 [3,] "Phi" "gFemale" "c1" "a2" "t3"
 [4,] "Phi" "gFemale" "c1" "a3" "t4"
 [5,] "Phi" "gFemale" "c1" "a4" "t5"
 [6,] "Phi" "gFemale" "c1" "a5" "t6"
 [7,] "Phi" "gMale"   "c1" "a0" "t1"
 [8,] "Phi" "gMale"   "c1" "a1" "t2"
 [9,] "Phi" "gMale"   "c1" "a2" "t3"
[10,] "Phi" "gMale"   "c1" "a3" "t4"
[11,] "Phi" "gMale"   "c1" "a4" "t5"
[12,] "Phi" "gMale"   "c1" "a5" "t6"
[13,] "p"   "gFemale" "c1" "a1" "t2"
>

but with the simplification process with PIMS some of these fields don't always reflect anything useful. For example a (age) isn't meaningful. The better approach is to merge with the design data and then find unique values if that is what you want. You can use duplicated function or something else.

WIth marked simply include hessian=TRUE to get the std errors
Code: Select all
> m_marked = marked::crm(
+     data = proc_marked,
+     ddl = design_marked,
+     model.parameters = list(
+         Phi = list(formula = ~sex + time),
+         p = list(formula = ~1)),
+     accumulate = FALSE,hessian=TRUE)
Computing initial parameter estimates

Starting optimization for 8 parameters...
 Number of evaluations:  500  -2lnl: 659.7797137Computing hessian...
 Number of evaluations:  200  -2lnl: 659.6497384
Elapsed time in minutes:  0.0183
[quote]
> m_marked$results$reals
$Phi
      sex time occ  estimate         se       lcl       ucl
1  Female    6   6 0.5762781 0.06246351 0.4516838 0.6918741
2    Male    6   6 0.5901055 0.06176023 0.4660395 0.7036739
3  Female    5   5 0.6010941 0.06018054 0.4795366 0.7113503
4    Male    5   5 0.6146532 0.05937495 0.4938953 0.7227712
5  Female    4   4 0.6179089 0.06177980 0.4919525 0.7297898
6    Male    4   4 0.6312471 0.06150784 0.5049158 0.7418254
7  Female    3   3 0.4717224 0.06288908 0.3525467 0.5942110
8    Male    3   3 0.4859181 0.06423722 0.3634637 0.6100868
9  Female    2   2 0.4480346 0.06990141 0.3180764 0.5855008
10   Male    2   2 0.4621411 0.07242222 0.3267788 0.6033249
11 Female    1   1 0.6184402 0.11543784 0.3832203 0.8087283
12   Male    1   1 0.6317710 0.11285033 0.3986818 0.8161690

$p
  occ  estimate         se       lcl      ucl
1   7 0.9021693 0.02902918 0.8287662 0.946151[/quote]
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: Explicit grouping variables when extracting real values

Postby Henrik-P » Wed Apr 24, 2024 4:17 am

Thanks a lot for your swift reply.

Just a short comment on the marked solution where you set hessian = TRUE in crm; I had worked through the 'Dipper data example' in the marked vignette. There, instead of setting hessian = TRUE in crm (i.e. keeping the default FALSE), cjs.hessian is applied on the model to compute the variance-covariance matrix from the hessian. Then, se, lcl, ucl of the estimates are printed in the model output. I erroneously assumed that the same method could be applied to obtain se, lcl, ucl for the reals (hence my use of marked::cjs.hessian(m_marked)).

Cheers
Henrik-P
 
Posts: 3
Joined: Tue Apr 23, 2024 6:55 am

Re: Explicit grouping variables when extracting real values

Postby Henrik-P » Fri Apr 26, 2024 10:20 am

I was considering an alternative:

Unique (simplified) real params are extracted from model object, and all real params from summary object. Select relevant columns (here, skip lcl and ucl for brevity; include estimate from summary object just for sanity check).

Add grouping variables from summary data to reals from model object by joining on row names.

Code: Select all
merge(
    m_rmark$results$real[ , c("estimate", "se")],
    summary(m_rmark, se = TRUE)$reals$Phi[ , c("estimate", "sex", "time")],
    by = "row.names"
    )


The output looks OK.
Code: Select all
                    Row.names estimate.x        se estimate.y    sex time
1  Phi gFemale c2017 a0 t2017  0.6184423 0.1154337  0.6184423 Female 2017
2  Phi gFemale c2017 a1 t2018  0.4480347 0.0699014  0.4480347 Female 2018
3  Phi gFemale c2017 a2 t2019  0.4717226 0.0628891  0.4717226 Female 2019
4  Phi gFemale c2017 a3 t2020  0.6179090 0.0617793  0.6179090 Female 2020
5  Phi gFemale c2017 a4 t2021  0.6010942 0.0601805  0.6010942 Female 2021
6  Phi gFemale c2017 a5 t2022  0.5762779 0.0624635  0.5762779 Female 2022
7    Phi gMale c2017 a0 t2017  0.6317731 0.1128463  0.6317731   Male 2017
8    Phi gMale c2017 a1 t2018  0.4621412 0.0724222  0.4621412   Male 2018
9    Phi gMale c2017 a2 t2019  0.4859184 0.0642372  0.4859184   Male 2019
10   Phi gMale c2017 a3 t2020  0.6312473 0.0615073  0.6312473   Male 2020
11   Phi gMale c2017 a4 t2021  0.6146534 0.0593749  0.6146534   Male 2021
12   Phi gMale c2017 a5 t2022  0.5901054 0.0617602  0.5901054   Male 2022


Please chime in if you spot any flaws.
Henrik-P
 
Posts: 3
Joined: Tue Apr 23, 2024 6:55 am

Re: Explicit grouping variables when extracting real values

Postby jlaake » Sat Apr 27, 2024 12:07 pm

Looks fine to me. There are multiple ways to do this. The simplified indices are stored in the model result. You may also want to look at covariate.predictions and predict_real.

Getting the unique real values was more critical for marked because the possible number of real parameters is n times as large where n is the number of capture histories. This has 2 advantages. First, there is no difference between design and individual covariates like in RMark. Secondly, it provides greater flexibility in model building and fixing parameters.

But it has a huge disadvantage because the memory and computing requirements can get out of hand when n and number of occasions are large. The marked package always uses the equivalent of all different PIMs and no way to reduce the number of real parameters.

Jeff
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA


Return to RMark

Who is online

Users browsing this forum: No registered users and 3 guests

cron