Page 1 of 1

creating capture histories in R

PostPosted: Wed Feb 05, 2014 3:49 pm
by jlaake
I recently had an exchange with a student in an R User's group about creating capture histories from capture/recapture data in R. In the past I've used pivot tables in Excel but there it is nice to be able to keep your analysis in one piece of software. I can't remember if I've posted on this before (getting old) and a search came up empty, so I put the following together.

His original question was how to create a capture history by year when he had a dataframe of animal ids and dates that were either captured or recaptured. This led to the following idea that evolved from the input in the group. If the variable is of class Date, then you can use

Code: Select all
# create some dummy dates from tomorrow to 20 days from today
x = c(Sys.Date()+1:20)
# extract the year and change to numeric
as.numeric(format(x, "%Y"))
# you can also extract the month and day with
as.numeric(format(x, "%m"))
as.numeric(format(x, "%d"))


Other alternatives exist with POSIX classes of date/times. Once the years were extracted then he could use the table function.


But more generally you'll want to use the cut function with date intervals which I use routinely. Here I create some dummy data and time intervals that are about 1/2 of a year. The intervals could be any series of dates. See ?cut for help on how end points are treated.

Code: Select all
# create dummy capture data; id is animal and date is the date it was captured or recaptured
df=data.frame(id=floor(runif(100,1,50)),date=runif(100,0,5000)+as.Date("1980-01-01"))
#create some dummy date intervals that are approximately every 6 months
intervals=as.Date("1979-01-01")+seq(180,15*365,182.5)


Now cut the observations into intervals loosely called occasions here. The intervals do not have to be equal so this approach will work as well for a robust design.
Code: Select all
# cut the dates into intervals
occasions=cut(df$date,intervals)


Creating a capture history can be done easily with the table function. Note that the table function will exclude data containing an NA.

Code: Select all
#create the count table with id for rows and years for columns
ch=with(df,table(id,occasions))


The table will tally all records for the animal in that date interval and for a capture history we only want 0/1 which can easily be done as follows:

Code: Select all
# can be caught multiple times in an occasion; change all >0 to 1
ch[ch>0]=1
ch


To use with RMark you need to create a capture history string which can be done using apply function on the ch matrix with the paste function

Code: Select all
# create capture history as a string
chstr=apply(ch,1,paste,collapse="")


Then you can put it into a dataframe and analyze it with RMark as shown below. When you put it into the dataframe, set stringsAsFactors=FALSE so it is not turned into a factor variable by default.

Code: Select all
markdata=data.frame(ch=chstr,stringsAsFactors=FALSE)
# use RMark to analyze the data
library(RMark)
mark(markdata,model="POPAN")


This can get a little messier if you have original capture records in one file and recapture records in another. You can either pool the data together or create separate capture histories and then add them. One important point there is to create an id factor variable in the recapture data to have the same levels as the original capture data because not all animals will have been recaptured. This will make sure the tables have the same number and order of rows. If the databases were called capture and recapture and the fields were called id then you can do the following:

Code: Select all
recapture$id=factor(as.character(recapture$id),levels=levels(capture$id))


You can also use this same approach with the L/D format. You would simply create a history of dead recoveries with the appropriate date intervals as above and create a table of id and occasions counts as shown above. There you'll need to make sure that your id and occasions have the same levels to make sure you get the same rows and columns. If there were k intervals you'll get 2k columns, and your matrices are ch (live) and dch (dead) then the following code will interleave them.

Code: Select all
fullmat=matrix(NA,nrow=nrow(ch),ncol=2*ncol(ch))
fullmat[,seq(1,2*ncol(ch),2)]=ch
fullmat[,seq(2,2*ncol(ch),2)]=dch
fullch=apply(fullmat,1,paste,collapse="")


Any NA's in fullch would indicate something went amiss. You'll want to check dim(ch) and dim(dch) to make sure they were the same dimensions.

I hope this will be useful for folks. If you have different ideas or approaches that you think are better please share them. It would be nice to have some general functions to create capture histories for the various models, if some clever person out there has some free time. If they were fairly general I could include them into RMark to avoid everyone having to re-create the "wheel".

regards --jeff

Re: creating capture histories in R

PostPosted: Thu Feb 13, 2014 8:14 pm
by mcmelnychuk
Jeff,
Thank you for that post - I found it very useful. Here is one way of generalizing your example code to a multi-state model. There may be more elegant ways, and I haven't thoroughly checked this, but a couple spot checks show that it seems to do as intended.
Mike

Code: Select all
df=data.frame(id=floor(runif(1000,1,50)), date=runif(100,0,5000)+as.Date("1980-01-01"), stratum=floor(runif(1000,1,4))) #increased n and added column for 3 strata
intervals=as.Date("1979-01-01")+seq(180,15*365,182.5)
occasions=cut(df$date,intervals)
ch=with(df,table(id,occasions))
recapsByStrata=with(df,table(id,occasions, stratum)) #like ch, but split by strata
print(c(length(ch), sum(ch), unique(ch)))
print(c(length(recapsByStrata), sum(recapsByStrata), unique(recapsByStrata)))
for(i in seq_along(attr(ch,"dimnames")$id)){
  for(j in seq_along(attr(ch,"dimnames")$occasions)){
    ch[i,j] <- ifelse(ch[i,j]>0, which.max(recapsByStrata[i,j,]),0) #replace total counts in ch with the stratum that had the most counts
  }
}
print(c(length(ch), sum(ch), unique(ch)))
chstr=apply(ch,1,paste,collapse="")
#head(chstr,20)

Re: creating capture histories in R

PostPosted: Tue Feb 18, 2014 12:34 pm
by jlaake
Mike-

Thanks for that. Here is a slight mod that avoids the for loops and also shows how to use alpha characters in place of numeric ones. Note that I'm not endorsing use of max to specifying strata when seen in multiple strata within an occasion. May not be a bad approach but ties could be problematic. Code below will work when only seen in one strata in an occasion. --jeff

Code: Select all
df=data.frame(id=floor(runif(1000,1,50)),
date=runif(100,0,5000)+as.Date("1980-01-01"),
stratum=floor(runif(1000,1,4)))
intervals=as.Date("1979-01-01")+seq(180,15*365,182.5)
occasions=cut(df$date,intervals)
ch=with(df,table(id,occasions))
ch[ch>0]=1
recapsByStrata=with(df,table(id,occasions, stratum))
ch=apply(recapsByStrata,c(1,2),which.max)*ch
strata=c("A","B","C")
ch[ch>0]=strata[ch]

Re: creating capture histories in R

PostPosted: Tue Feb 18, 2014 2:44 pm
by mcmelnychuk
much improved - thank you again!