www.phidot.org

by **Henrik-P** » Tue Apr 23, 2024 10:21 am

I fit a model using RMark::mark with some grouping variables. When I extract the real Phi parameters from the model object, using

Code: Select all: the-name-of-the-model$results$real

, the values of the grouping variables are given as concatenated row names. Although I can use some string processing to extract the values of the groups, I wonder if there is a simpler way.

For comparison, when extracting the real parameters from a model fitted using marked::crm, the grouping variables are given in separate columns, as desired. However, now only the estimates are shown, but not se, lcl and ucl.

Here's a small example using the dipper data. Models are fitted with both RMark and marked to obtain sex- and time-specific estimates.

Create processed data and design data:

Code: Select all: library(RMark) data(dipper) # process data proc_rmark = RMark::process.data(data = dipper, groups = ("sex")) proc_marked = marked::process.data(data = dipper) # create design data design_rmark = RMark::make.design.data(proc_rmark) design_marked = marked::make.design.data(proc_marked)

Fit models:

Code: Select all: m_rmark = RMark::mark( data = proc_rmark, ddl = design_rmark, model.parameters = list( Phi = list(formula = ~sex + time), p = list(formula = ~1))) m_marked = marked::crm( data = proc_marked, ddl = design_marked, model.parameters = list( Phi = list(formula = ~sex + time), p = list(formula = ~1)), accumulate = FALSE) m_marked = marked::cjs.hessian(m_marked)

Get real values from models:
RMark:

Code: Select all: m_rmark$results$real estimate se lcl ucl Phi gFemale c1 a0 t1 0.6184423 0.1154337 0.3832299 0.8087247 Phi gFemale c1 a1 t2 0.4480347 0.0699014 0.3180764 0.5855008 Phi gFemale c1 a2 t3 0.4717226 0.0628891 0.3525469 0.5942113 Phi gFemale c1 a3 t4 0.6179090 0.0617793 0.4919537 0.7297892 Phi gFemale c1 a4 t5 0.6010942 0.0601805 0.4795367 0.7113504 Phi gFemale c1 a5 t6 0.5762779 0.0624635 0.4516836 0.6918740 Phi gMale c1 a0 t1 0.6317731 0.1128463 0.3986916 0.8161656 Phi gMale c1 a1 t2 0.4621412 0.0724222 0.3267790 0.6033250 Phi gMale c1 a2 t3 0.4859184 0.0642372 0.3634640 0.6100871 Phi gMale c1 a3 t4 0.6312473 0.0615073 0.5049171 0.7418247 Phi gMale c1 a4 t5 0.6146534 0.0593749 0.4938954 0.7227713 Phi gMale c1 a5 t6 0.5901054 0.0617602 0.4660395 0.7036738 p gFemale c1 a1 t2 0.9021693 0.0290292 0.8287662 0.9461511

marked:

Code: Select all: m_marked$results$real$Phi sex time occ estimate 1 Female 6 6 0.5762781 2 Male 6 6 0.5901055 3 Female 5 5 0.6010941 4 Male 5 5 0.6146532 5 Female 4 4 0.6179089 6 Male 4 4 0.6312471 7 Female 3 3 0.4717224 8 Male 3 3 0.4859181 9 Female 2 2 0.4480346 10 Male 2 2 0.4621411 11 Female 1 1 0.6184402 12 Male 1 1 0.6317710

Using RMark
Pros: values of estimates are given with se, lcl and ucl.
Cons: values of the grouping variables (sex and time) are given as concatenated string in rownames.

Using marked
Pros: values of grouping variables (sex and time) are given in separate columns.
Cons: only estimates, no se, lcl and ucl

What would be the canonical way to get estimates of real parameters with se, lcl and ucl, and the grouping variables as explicit columns? Although the main part of my work is performed in RMark, I'm happy to see solutions for both RMark and marked.

Thank you for your great work!

by **jlaake** » Tue Apr 23, 2024 5:18 pm

Unless you want to use a model in marked that is not in MARK, then I highly suggest you use RMark/MARK. The latter is much more developed and complete. The marked package was NEVER intended to replace or compete with MARK which has far more capabilities than marked. My intention for marked was proof of concept to provide open source to show how these models can be developed particularly within the context of hidden markov models. If you look at the code for marked you will see its evolution from using FORTRAN with optim to ADMB and then to TMB with the focus on hidden markov model approach.

To address your questions, with RMark you can bind the design data for a parameter with the reals for that parameter. You can limit the design data columns you want to give you whatever data you want. This will give you all of the estimates and not the unique ones

Code: Select all: cbind(design_rmark$Phi,summary( m_rmark,se=TRUE)$reals$Phi)

m_rmark$results$real gives you the unique ones and you can split off values with

Code: Select all: t(sapply(strsplit(rownames(x)," "),function(z)z)) [,1] [,2] [,3] [,4] [,5] [1,] "Phi" "gFemale" "c1" "a0" "t1" [2,] "Phi" "gFemale" "c1" "a1" "t2" [3,] "Phi" "gFemale" "c1" "a2" "t3" [4,] "Phi" "gFemale" "c1" "a3" "t4" [5,] "Phi" "gFemale" "c1" "a4" "t5" [6,] "Phi" "gFemale" "c1" "a5" "t6" [7,] "Phi" "gMale" "c1" "a0" "t1" [8,] "Phi" "gMale" "c1" "a1" "t2" [9,] "Phi" "gMale" "c1" "a2" "t3" [10,] "Phi" "gMale" "c1" "a3" "t4" [11,] "Phi" "gMale" "c1" "a4" "t5" [12,] "Phi" "gMale" "c1" "a5" "t6" [13,] "p" "gFemale" "c1" "a1" "t2" >

but with the simplification process with PIMS some of these fields don't always reflect anything useful. For example a (age) isn't meaningful. The better approach is to merge with the design data and then find unique values if that is what you want. You can use duplicated function or something else.

WIth marked simply include hessian=TRUE to get the std errors

Code: Select all: > m_marked = marked::crm( + data = proc_marked, + ddl = design_marked, + model.parameters = list( + Phi = list(formula = ~sex + time), + p = list(formula = ~1)), + accumulate = FALSE,hessian=TRUE) Computing initial parameter estimates Starting optimization for 8 parameters... Number of evaluations: 500 -2lnl: 659.7797137Computing hessian... Number of evaluations: 200 -2lnl: 659.6497384 Elapsed time in minutes: 0.0183 [quote] > m_marked$results$reals $Phi sex time occ estimate se lcl ucl 1 Female 6 6 0.5762781 0.06246351 0.4516838 0.6918741 2 Male 6 6 0.5901055 0.06176023 0.4660395 0.7036739 3 Female 5 5 0.6010941 0.06018054 0.4795366 0.7113503 4 Male 5 5 0.6146532 0.05937495 0.4938953 0.7227712 5 Female 4 4 0.6179089 0.06177980 0.4919525 0.7297898 6 Male 4 4 0.6312471 0.06150784 0.5049158 0.7418254 7 Female 3 3 0.4717224 0.06288908 0.3525467 0.5942110 8 Male 3 3 0.4859181 0.06423722 0.3634637 0.6100868 9 Female 2 2 0.4480346 0.06990141 0.3180764 0.5855008 10 Male 2 2 0.4621411 0.07242222 0.3267788 0.6033249 11 Female 1 1 0.6184402 0.11543784 0.3832203 0.8087283 12 Male 1 1 0.6317710 0.11285033 0.3986818 0.8161690 $p occ estimate se lcl ucl 1 7 0.9021693 0.02902918 0.8287662 0.946151[/quote]

by **Henrik-P** » Wed Apr 24, 2024 4:17 am

Thanks a lot for your swift reply.

Just a short comment on the marked solution where you set hessian = TRUE in crm; I had worked through the 'Dipper data example' in the marked vignette. There, instead of setting hessian = TRUE in crm (i.e. keeping the default FALSE), cjs.hessian is applied on the model to compute the variance-covariance matrix from the hessian. Then, se, lcl, ucl of the estimates are printed in the model output. I erroneously assumed that the same method could be applied to obtain se, lcl, ucl for the reals (hence my use of marked::cjs.hessian(m_marked)).

Cheers

by **Henrik-P** » Fri Apr 26, 2024 10:20 am

I was considering an alternative:

Unique (simplified) real params are extracted from model object, and all real params from summary object. Select relevant columns (here, skip lcl and ucl for brevity; include estimate from summary object just for sanity check).

Add grouping variables from summary data to reals from model object by joining on row names.

Code: Select all: merge( m_rmark$results$real[ , c("estimate", "se")], summary(m_rmark, se = TRUE)$reals$Phi[ , c("estimate", "sex", "time")], by = "row.names" )

The output looks OK.

Code: Select all: Row.names estimate.x se estimate.y sex time 1 Phi gFemale c2017 a0 t2017 0.6184423 0.1154337 0.6184423 Female 2017 2 Phi gFemale c2017 a1 t2018 0.4480347 0.0699014 0.4480347 Female 2018 3 Phi gFemale c2017 a2 t2019 0.4717226 0.0628891 0.4717226 Female 2019 4 Phi gFemale c2017 a3 t2020 0.6179090 0.0617793 0.6179090 Female 2020 5 Phi gFemale c2017 a4 t2021 0.6010942 0.0601805 0.6010942 Female 2021 6 Phi gFemale c2017 a5 t2022 0.5762779 0.0624635 0.5762779 Female 2022 7 Phi gMale c2017 a0 t2017 0.6317731 0.1128463 0.6317731 Male 2017 8 Phi gMale c2017 a1 t2018 0.4621412 0.0724222 0.4621412 Male 2018 9 Phi gMale c2017 a2 t2019 0.4859184 0.0642372 0.4859184 Male 2019 10 Phi gMale c2017 a3 t2020 0.6312473 0.0615073 0.6312473 Male 2020 11 Phi gMale c2017 a4 t2021 0.6146534 0.0593749 0.6146534 Male 2021 12 Phi gMale c2017 a5 t2022 0.5901054 0.0617602 0.5901054 Male 2022

Please chime in if you spot any flaws.

by **jlaake** » Sat Apr 27, 2024 12:07 pm

Looks fine to me. There are multiple ways to do this. The simplified indices are stored in the model result. You may also want to look at covariate.predictions and predict_real.

Getting the unique real values was more critical for marked because the possible number of real parameters is n times as large where n is the number of capture histories. This has 2 advantages. First, there is no difference between design and individual covariates like in RMark. Secondly, it provides greater flexibility in model building and fixing parameters.

But it has a huge disadvantage because the memory and computing requirements can get out of hand when n and number of occasions are large. The marked package always uses the equivalent of all different PIMs and no way to reduce the number of real parameters.

Jeff

www.phidot.org

Explicit grouping variables when extracting real values

Explicit grouping variables when extracting real values

Re: Explicit grouping variables when extracting real values

Re: Explicit grouping variables when extracting real values

Re: Explicit grouping variables when extracting real values

Re: Explicit grouping variables when extracting real values

Who is online