splines with time-varying covariates

posts related to the RMark library, which may not be of general interest to users of 'classic' MARK

splines with time-varying covariates

Postby SoConfused » Mon Sep 13, 2021 10:48 am

Hello,

I'm trying to do a spline regression on a time-varying individual covariate (in my full data - age, ranging from 1 to 29 years). I found this post on doing splines with RMark http://www.phidot.org/forum/viewtopic.php?f=21&t=2454, and I can add a time-varying covariate, but I can't seem to make them work together. The model runs, but throws a pile of "truncating string with embedded nuls" warnings, and only returns an intercept coefficient for p. Any help would be appreciated.

Code: Select all
library(RMark)
library(splines)

data(dipper)
# add a fake time-varying covariate, really just an "occurrence" in this case
dipper$td1981 <- 1
dipper$td1982 <- dipper$td1981 + 1
dipper$td1983 <- dipper$td1981 + 2
dipper$td1984 <- dipper$td1981 + 3
dipper$td1985 <- dipper$td1981 + 4
dipper$td1986 <- dipper$td1981 + 5

dipper.processed=process.data(dipper,begin.time=1980)
dipper.ddl=make.design.data(dipper.processed)

data1.analysis=function(){   
   Phi.dot=list(formula=~1)
   p.td.spline=list(formula=~bs(td))
   cml=create.model.list("CJS")
   mark.wrapper(cml,data=dipper.processed, ddl=dipper.ddl, delete = FALSE)
           }
          
data1.results <- data1.analysis()
data1.results
SoConfused
 
Posts: 56
Joined: Wed Nov 05, 2014 8:25 am

Re: splines with time-varying covariates

Postby jlaake » Mon Sep 13, 2021 12:20 pm

I should have been more clear when I posted that message that it will only work for design covariates and not individual covariates because individual covariates are entered as strings in the design matrix with MARK. Since it is age you are using you don't need to use as a time-varying individual covariate as long as you use age as a grouping variable and assign an initial.age value for each group. With that many ages it may give you lots of groups but that is the easiest solution for you.

You can use model.matrix with bs() function to create the design matrix entries for the spline and then use those as your covariates but that will get quite complex with lots of times but it is doable. If you are going to specify an intercept with the formula then you won't want to use the intercept from model.matrix from bs().
The easiest solution for you is to use age groups and then you can use bs(Age) in your formula. I'll try to add an addendum to http://www.phidot.org/forum/viewtopic.php?f=21&t=2454 at some point.
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: splines with time-varying covariates

Postby jlaake » Mon Sep 13, 2021 4:33 pm

You better stick with design covariates. Not sure how to make that work with individual covariates.
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: splines with time-varying covariates

Postby jlaake » Mon Sep 13, 2021 7:05 pm

Forgot to mention that the "embedded NULL" message has nothing to do with the splines. This popped up with R4.0 with the R functions readChar when I read in the binary files from MARK. It is just a warning and can be ignored. I fixed it in RMark version 2.2.8 on my github site. You can get the binary .zip file at https://drive.google.com/file/d/1w30j8GfIP-ZbyzF08PIUqHYXb_wdaDom/view?usp=sharing

I'm currently working on v2.2.9 which fixes a couple of other issues with advent of R4.1. I'll post to CRAN when I'm done.

--jeff
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: splines with time-varying covariates

Postby SoConfused » Tue Sep 14, 2021 7:33 am

Jeff,

Thank you so much for the responses and the suggestions. I spent a while yesterday poking at it. Apologies for following up with what seems a basic misunderstanding with the use of ages. I have 2 toy examples below, based on dipper data. In the first, the year classes of dippers vary (they're a derivative of an assigned age); in the second, the year class is a single constant value and ages are derived. In both, the range of ages in real$p exceeds the range of ages in the actual data, and the differences differ between the two examples. Why is that happening and what am I missing? (also - am I specifying the init.age correctly relative to the data? It's been a struggle for me)

Derived year class:
Code: Select all
library(RMark)
library(dplyr)
library(stringr)

data(dipper)
set.seed(0)

dipper <- dipper[sample(1:nrow(dipper), 30, replace = FALSE),]
dipper$age <- sample(1:10, nrow(dipper), replace = TRUE)
begin.time=1980
dipper$FirstYear <- begin.time + as.data.frame(str_locate(dipper$ch, "1"))$start - 1
dipper$YearClass <- dipper$FirstYear - dipper$age
dipper <- arrange(dipper, age)
dipper$ageFac <- as.factor(dipper$age)

dipper.processed=process.data(dipper,begin.time=begin.time, groups = c("YearClass", "ageFac"), age.var = 2, initial.age = unique(dipper$age))
dipper.ddl=make.design.data(dipper.processed)

data1.analysis=function(){   
   Phi.dot=list(formula=~1)
   p.age.spline=list(formula=~bs(Age))
   cml=create.model.list("CJS")
   mark.wrapper(cml,data=dipper.processed, ddl=dipper.ddl, delete = FALSE)
           }
          
data1.results <- data1.analysis()
data1.results

predp <- summary(data1.results[[1]], se=TRUE)$real$p %>%
      select(Age, estimate, lcl, ucl) %>%
      unique()

pred.sm <- data.frame(spline(predp$Age, predp$estimate, n=100))
pred.sm$lcl <- spline(predp$Age, predp$lcl, n=100)$y
pred.sm$ucl <- spline(predp$Age, predp$ucl, n=100)$y
ggplot(aes(x=x, y=y), data=pred.sm) +
   geom_line() + geom_ribbon(aes(ymin=lcl, ymax=ucl), alpha=0.25)
dipper$MaxAge <- 1986 - dipper$FirstYear + dipper$age
max(dipper$MaxAge) # 13, but the plot extends to 15
max(predp$Age)


A single constant year class, derived ages:
Code: Select all
data(dipper)
set.seed(0)

dipper <- dipper[sample(1:nrow(dipper), 50, replace = FALSE),]
dipper$YearClass <- 1975
dipper$FirstYear <- begin.time + as.data.frame(str_locate(dipper$ch, "1"))$start - 1
dipper$age <- dipper$FirstYear - dipper$YearClass

begin.time=1980
dipper <- arrange(dipper, age)
dipper$ageFac <- as.factor(dipper$age)

dipper.processed=process.data(dipper,begin.time=begin.time, groups = c("YearClass", "ageFac"), age.var = 2, initial.age = unique(dipper$age))
dipper.ddl=make.design.data(dipper.processed)

data1.analysis=function(){   
   Phi.dot=list(formula=~1)
   p.age.spline=list(formula=~bs(Age))
   cml=create.model.list("CJS")
   mark.wrapper(cml,data=dipper.processed, ddl=dipper.ddl, delete = FALSE)
           }
          
data1.results <- data1.analysis()
data1.results

predp <- summary(data1.results[[1]], se=TRUE)$real$p %>%
      select(Age, estimate, lcl, ucl) %>%
      unique()

pred.sm <- data.frame(spline(predp$Age, predp$estimate, n=100))
pred.sm$lcl <- spline(predp$Age, predp$lcl, n=100)$y
pred.sm$ucl <- spline(predp$Age, predp$ucl, n=100)$y
ggplot(aes(x=x, y=y), data=pred.sm) +
   geom_line() + geom_ribbon(aes(ymin=lcl, ymax=ucl), alpha=0.25)
dipper$MaxAge <- 1986 - dipper$FirstYear + dipper$age
max(dipper$MaxAge) # 11, but the plot extends to 17
max(predp$Age)
SoConfused
 
Posts: 56
Joined: Wed Nov 05, 2014 8:25 am

Re: splines with time-varying covariates

Postby jlaake » Tue Sep 14, 2021 8:01 pm

I'm not sure what you mean by year-class. Let me show with a simpler example where you have 3 initial ages. Initial age is the age they are when they are first captured. The design data is incremented by the time.interval (typically 1) for each occasion subsequent to their first capture occasion. Here is an example with initial.ages 0,1,2. I'm using just a snippet of the dipper data. The first 5 records were age=0 when first captured, the second 5 were age= 1 when first captured and the last 5 were age age=2 when first captured.

Code: Select all
data(dipper)
dip=dipper[c(1:5,105:109,290:294),]
dip$facage=factor(c(rep(0,5),rep(1,5),rep(2,5)))
str(dip)
dp=process.data(dip,model="CJS",groups=c("sex","facage"),age.var=2,initial.ages=c(0,1,2))
ddl=make.design.data(dp)


Here is some of the records of the design data for Phi for age 0 females. Any age 0 female caught on first occasion (cohort=1) are age 0 for the Phi interval from time 1 to 2. They are age 1 from time 2 to 3 interval for Phi. Now look down to row 7 for a female age 0 first caught on the second occasion (cohort=2). They are age 0 for the time interval for time 2 to 3, etc.

Code: Select all
 ddl$Phi[1:21,3:12]

      group cohort age time occ.cohort Cohort Age Time    sex facage
1   Female0      1   0    1          1      0   0    0 Female      0
2   Female0      1   1    2          1      0   1    1 Female      0
3   Female0      1   2    3          1      0   2    2 Female      0
4   Female0      1   3    4          1      0   3    3 Female      0
5   Female0      1   4    5          1      0   4    4 Female      0
6   Female0      1   5    6          1      0   5    5 Female      0
7   Female0      2   0    2          2      1   0    1 Female      0
8   Female0      2   1    3          2      1   1    2 Female      0
9   Female0      2   2    4          2      1   2    3 Female      0
10  Female0      2   3    5          2      1   3    4 Female      0
11  Female0      2   4    6          2      1   4    5 Female      0
12  Female0      3   0    3          3      2   0    2 Female      0
13  Female0      3   1    4          3      2   1    3 Female      0
14  Female0      3   2    5          3      2   2    4 Female      0
15  Female0      3   3    6          3      2   3    5 Female      0
16  Female0      4   0    4          4      3   0    3 Female      0
17  Female0      4   1    5          4      3   1    4 Female      0
18  Female0      4   2    6          4      3   2    5 Female      0
19  Female0      5   0    5          5      4   0    4 Female      0
20  Female0      5   1    6          5      4   1    5 Female      0
21  Female0      6   0    6          6      5   0    5 Female      0



Now looking at males first caught at age 2 on occasion 1 and occasion 2. Because they were age 2 at first occasion (cohort 1) they are age 2 for time interval from 1 to 2, and age 3 from time 2 to 3, etc. The age variable in the design data is incremented using the initial.age value and the time.interval based on their cohort (occasion they were first captured). Then you can use bs(Age) but not bs(age) which is a factor variable.

Code: Select all
   group cohort age time occ.cohort Cohort Age Time  sex facage
85 Male2      1   2    1          1      0   2    0 Male      2
86 Male2      1   3    2          1      0   3    1 Male      2
87 Male2      1   4    3          1      0   4    2 Male      2
88 Male2      1   5    4          1      0   5    3 Male      2
89 Male2      1   6    5          1      0   6    4 Male      2
90 Male2      1   7    6          1      0   7    5 Male      2
91 Male2      2   2    2          2      1   2    1 Male      2
92 Male2      2   3    3          2      1   3    2 Male      2
93 Male2      2   4    4          2      1   4    3 Male      2
94 Male2      2   5    5          2      1   5    4 Male      2
95 Male2      2   6    6          2      1   6    5 Male      2
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: splines with time-varying covariates

Postby SoConfused » Wed Sep 15, 2021 7:51 am

Ah-ha, I finally know how to ask what I'm trying to figure out (and huge thanks for your patience).

The design data have animals that don't exist in my data. I have a yet another dipper example below. In it, I have 2 age-9 animals, which are only encountered in later occasions. But in the dll data, there are age-9 animals starting from occasion 1, which extends the range of ages (compared to the real observed ages). This creates a problem for me, because it "whips" that end of the spline up, where ages are higher than anything observed in the data proper. Toy example below, then an image from my actual data - I only have animals up to age 25, but the model is estimating effects up to age 37, and that end of the spline is useless to me. How do I deal with that? Just to clarify - I can easily just grab the ages I need from the estimates, but does the modeling of the right-hand tail not affect the estimates in the intermediate age range?

Code: Select all

library(RMark)
library(dplyr)
library(stringr)
library(splines)

data(dipper)
set.seed(0)

dipper <- dipper[sample(1:nrow(dipper), 30, replace = FALSE),]
dipper$age <- sample(1:10, nrow(dipper), replace = TRUE)
begin.time=1980
dipper <- arrange(dipper, age)
dipper$ageFac <- as.factor(dipper$age)

dipper.processed=process.data(dipper,begin.time=begin.time, groups = c("sex", "ageFac"), age.var = 2, initial.age = unique(dipper$age))
dipper.ddl=make.design.data(dipper.processed)

data1.analysis=function(){   
   Phi.dot=list(formula=~1)
   p.age.spline=list(formula=~bs(Age))
   cml=create.model.list("CJS")
   mark.wrapper(cml,data=dipper.processed, ddl=dipper.ddl, delete = FALSE)
           }
          
data1.results <- data1.analysis()
data1.results

predp <- summary(data1.results[[1]], se=TRUE)$real$p %>%
      select(Age, estimate, lcl, ucl) %>%
      unique()


Compare observed ages (calculate the maximum age all animals attain by end of sampling) and ages from the dll.
Code: Select all
dipper$FirstYear <- begin.time + as.data.frame(str_locate(dipper$ch, "1"))$start - 1 # first year animal is recorded
dipper$MaxAge <- 1986 - dipper$FirstYear + dipper$age
max(dipper$MaxAge) # 13
max(predp$Age) # 15 - a wider range of ages than actually observed

dipper.ddl$p %>% filter(Age == 15) # look at which dll animals reach age 15 - these have ageFac of 9 and belong to cohort 1980
dipper %>% filter(ageFac == 9) # check the data - all ageFac 9 animals were caught in late sessions


Image
SoConfused
 
Posts: 56
Joined: Wed Nov 05, 2014 8:25 am

Re: splines with time-varying covariates

Postby jlaake » Wed Sep 15, 2021 2:49 pm

That is certainly possible and splines can do that especially at higher ages because the sample size declines with mortality. But does it really matter? You can always display for the range of ages you have in the data. If there are no animals at those larger ages the spline values are not being used for anything in the model fit. not sure what you are concerned about.

--jeff
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: splines with time-varying covariates

Postby SoConfused » Wed Sep 15, 2021 4:45 pm

Well, sounds like I'm all good to go then. Huge thanks again for all the help!
SoConfused
 
Posts: 56
Joined: Wed Nov 05, 2014 8:25 am

Re: splines with time-varying covariates

Postby SoConfused » Tue Sep 21, 2021 9:32 am

Sorry, one more question (and obviously basic, at that) - my actual model is a POPAN, and the resulting ddl structure is quite different - there are no young ages in later sampling occasions. So - if I was running a ~Age+time model, how would I get the predictions for all ages and all times, since I can no longer access the full age range for each sampling occasion from the summary(model)$real$p? It doesn't look like I can use covariate.predictions, since they're restricted by the init.age grouping (so I only get a prediction for that one age across time). I'm not 100% sure how to predict from a set of spline coefficients here... How do I get the real values for all ages in all sampling occasions?

Using your last example:
Code: Select all
data(dipper)
dip=dipper[c(1:5,105:109,290:294),]
dip$facage=factor(c(rep(0,5),rep(1,5),rep(2,5)))
str(dip)
# CJS ddl data - all ages available in the last sampling occasion in the ddl
dp=process.data(dip,model="CJS",groups=c("sex","facage"),age.var=2,initial.ages=c(0,1,2))
ddl=make.design.data(dp)
ddl$p[ddl$p$time == 7,]

# POPAN - only ages 6-8 available in the last sampling occasion in the ddl
dp=process.data(dip,model="POPAN",groups=c("sex","facage"),age.var=2,initial.ages=c(0,1,2))
ddl=make.design.data(dp)
ddl$p[ddl$p$time == 7,]
SoConfused
 
Posts: 56
Joined: Wed Nov 05, 2014 8:25 am

Next

Return to RMark

Who is online

Users browsing this forum: No registered users and 1 guest

cron