creating squared covariates within RMark

posts related to the RMark library, which may not be of general interest to users of 'classic' MARK

creating squared covariates within RMark

Postby markmiller » Mon Apr 17, 2023 3:53 pm

I have a multi-season occupancy data set like this, containing 18 time-varying individual covariates:

Code: Select all
1000000000 1   12 9 10 15 20   144 81 100 225 400    9 10 15 20    81 100 225 400 ;
0100100001 1   12 9 10 15 20   144 81 100 225 400    9 10 15 20    81 100 225 400 ;
0101001010 1   12 9 10 15 20   144 81 100 225 400    9 10 15 20    81 100 225 400 ;
1001111111 1   12 9 10 15 20   144 81 100 225 400    9 10 15 20    81 100 225 400 ;
0010000000 1   12 9 10 15 20   144 81 100 225 400    9 10 15 20    81 100 225 400 ;


This data set is read using the following RMark code:

Code: Select all
n.days <- 5
mydata <- convert.inp("fake.inp",
          covariates = c(paste0("ptemp",   seq_along(1:n.days)),
                         paste0("ptempsq", seq_along(1:n.days)),
                         paste0("temp",    seq_along(1:(n.days-1))),
                         paste0("tempsq",  seq_along(1:(n.days-1)))))


The covariates ptemp and ptempsq are for modelling annual detection using one temperature for each of five days. The covariates temp and tempsq are for modelling annual epsilon and gamma using one temperature value for each of four days. Here the first value in ptemp and ptempsq are discarded when creating temp and tempsq.

Is it possible to restrict the individual covariates in the input data set to ptemp as shown below and create ptempsq, temp and tempsq from inside the RMark R file?

Code: Select all
1000000000 1  12 9 10 15 20 ;
0100100001 1  12 9 10 15 20 ;
0101001010 1  12 9 10 15 20 ;
1001111111 1  12 9 10 15 20 ;
0010000000 1  12 9 10 15 20 ;


I have over 100 sample days and potentially something close to 2500 individual covariates if using the format shown in the first data set above. I thought creating some covariates within the R file might make the input data file more manageable and might prevent potential problems associated with reading enormously long lines of data.

I have found two examples of creating squared terms in the RMark documentation and Appendix C of the MARK book. But neither seems to implement what I am hoping to do. The example input data file 'indcov2.inp' contains mass and sqmass similar to my initial data set above. I also found an example of squaring a term in the ddl file:

"If we wanted to define a model for p that was a function of age and age squared, we could add the age squared variable as: ddl$p$Agesq=ddl$p$Age^2"

I have tried creating the squared terms and the covariates for epsilon and gamma inside the process.data object using the R code below. This code with the second (smaller) input data set returns the same estimates as the code above with the original data set.

Is the approach below an acceptable way of creating covariates inside RMark?

Code: Select all
n.days <- 5
mydata <- convert.inp("fakeb.inp",
          covariates = c(paste0("ptemp", seq_along(1:n.days))))

mydata.processed = process.data(mydata, begin.time = 1, model = "RDOccupEG",
                                time.intervals = c(rep(c(0,1), (n.days-1)),0))

mydata.processed$data$ptempsq1  <- mydata.processed$data$ptemp1^2
mydata.processed$data$ptempsq2  <- mydata.processed$data$ptemp2^2
mydata.processed$data$ptempsq3  <- mydata.processed$data$ptemp3^2
mydata.processed$data$ptempsq4  <- mydata.processed$data$ptemp4^2
mydata.processed$data$ptempsq5  <- mydata.processed$data$ptemp5^2

mydata.processed$data$temp1     <- mydata.processed$data$ptemp2
mydata.processed$data$temp2     <- mydata.processed$data$ptemp3
mydata.processed$data$temp3     <- mydata.processed$data$ptemp4
mydata.processed$data$temp4     <- mydata.processed$data$ptemp5

mydata.processed$data$tempsq1   <- mydata.processed$data$ptemp2^2
mydata.processed$data$tempsq2   <- mydata.processed$data$ptemp3^2
mydata.processed$data$tempsq3   <- mydata.processed$data$ptemp4^2
mydata.processed$data$tempsq4   <- mydata.processed$data$ptemp5^2


I found the post at the link below using 'merge_design.covariates' to add covariates to the design matrix ddl object. But I am not sure that is appropriate in my case:

viewtopic.php?f=21&t=4377&p=14595&hilit=data.processed#p14595

Thank you for any thoughts on creating time-varying individual covariates from within RMark code.
markmiller
 
Posts: 49
Joined: Fri Nov 08, 2013 6:23 pm

Re: creating squared covariates within RMark

Postby jlaake » Mon Apr 17, 2023 7:26 pm

The ddl is for group/time/cohort etc covariates and the data file with the capture history is for individual covariates. Always keep these two separate in your mind. Individual covariates are typically numeric although you can use a factor variable by creating 0/1 dummy variables for k-1 levels when the individual factor variable has k levels.

Now there is no reason to read in data that can be computed from other variables. So yes, read in the temperature variables and then compute the squared values. But do it in the data file before using process.data and instead of using $fieldname approach use column numbers/names and your code will be much simpler. That approach would be even more important if you had more occasions/covariates.
jlaake
 
Posts: 1417
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: creating squared covariates within RMark

Postby markmiller » Tue Apr 18, 2023 9:59 am

Thank you, Jeff. That was really helpful. Here is the code I used. I do not know why I did not look at the convert.inp object earlier. Sometimes I just make things too difficult and overlook the obvious.

Code: Select all
library(RMark)

n.days <- 5

mydata <- convert.inp("fake.inp",
                      covariates = c(paste0("ptemp", seq_along(1:n.days))))
head(mydata)

ptempsq <- mydata[,c(3:ncol(mydata))]
ptempsq <- ptempsq^2
colnames(ptempsq) <- paste0('ptempsq', 1:n.days)
head(ptempsq)

temp <- mydata[,c(4:ncol(mydata))]
colnames(temp) <- paste0('temp', 1:(n.days-1))
head(temp)

tempsq <- mydata[,c(4:ncol(mydata))]
tempsq <- tempsq^2
colnames(tempsq) <- paste0('tempsq', 1:(n.days-1))
head(tempsq)

mydata <- cbind(mydata, ptempsq)
mydata <- cbind(mydata, temp)
mydata <- cbind(mydata, tempsq)
head(mydata)
markmiller
 
Posts: 49
Joined: Fri Nov 08, 2013 6:23 pm

Re: creating squared covariates within RMark

Postby jlaake » Wed Apr 19, 2023 11:28 am

The following is a simpler way to do it. I create some dummy data to illustrate.

Code: Select all
> dummy=data.frame(x1=1:5,x2=2:6,x3=3:7)
> fnames=colnames(dummy)
> newnames=paste(fnames,"sq",sep="")
> fnames
[1] "x1" "x2" "x3"
> newnames
[1] "x1sq" "x2sq" "x3sq"
> dummy[,newnames]=dummy[fnames]^2
> dummy
  x1 x2 x3 x1sq x2sq x3sq
1  1  2  3    1    4    9
2  2  3  4    4    9   16
3  3  4  5    9   16   25
4  4  5  6   16   25   36
5  5  6  7   25   36   49


I split it up to make it more clear but it can actually be done in on line of code as shown below. R operates on vectors which can really simplify your code if you can teach yourself to think that way.

Code: Select all
> dummy[,paste(colnames(dummy),"sq",sep="")]=dummy[colnames(dummy)]^2
> dummy
  x1 x2 x3 x1sq x2sq x3sq
1  1  2  3    1    4    9
2  2  3  4    4    9   16
3  3  4  5    9   16   25
4  4  5  6   16   25   36
5  5  6  7   25   36   49
jlaake
 
Posts: 1417
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA


Return to RMark

Who is online

Users browsing this forum: No registered users and 9 guests

cron