Brief History
Over the past 13 years, I and my colleagues at West Inc have had several dozen consulting projects that involved mark-recapture data. To do these projects, we needed a variety of software, but ended up writing much of our own because we needed to control some detail of the analysis. In about 1992, Dr. Bryan Manly wrote some Fortran code to do a CJS simulation. Since then, I have been modifying, expanding and maintaining that original code. When most of us at West converted to R, the code grew into an R package (named mra). It was a simple matter to share this package via the open-source facilities set up for R. Doing so has added benefit in that I am forced to keep up with documentation. (More history is in help(mra), available after installing and loading.)
Motivation
I am of the opinion that more software options are ultimately better for consumers. My purposes in sharing mra are to maintain my own code and documentation that I understand, and to share what I think might be useful to others in a collaborative setting. If others find it useful, great. If not, that is okay too. I need to maintain the code anyway. I view the sharing aspect of this effort as inherently collaborative.
Why Consider Using mra?
Parametrization: The reason I am maintaining my own code is that I (and Dr. Manly) originally struggled to understand the parameterization of MARK. I also struggle to understand the way MARK's parameterization gets translated by RMark. I now "get it", but we had to make progress back then. As a result, but also because Bryan Manly was and is not a MARK user, he (by accident) developed a different parameterization for mra that is easier for me to understand and convey to others. This parameterization makes model specification more "regression like", and this fact makes it easier to focus on modeling and biology, in my opinion.
For those familiar with the parameterization of MARK, mra's "PIM" matrices are essentially rectangular, one parameter per individual and occasion, and all effects are specified via occasion-and-individual varying covariate matrices. For those unfamiliar with MARK's PIMs, I believe you can ignore the concept of PIM's altogether and go straight to specifying effects in the model. We have presented this parameterization, and the regression equations it induces, at several workshops and in Amstrup et al (2005, "Handbook of Capture-Recapture Analysis", Princeton, p. 224). It has been well received each time. The CJS routine in Jeff Laake's newest version of RMark uses this parameterization.
Model specification: Models in mra are specified using standard R formula objects like '~x1+x2', where x1 and x2 are matrices. Assuming ch.hist is a nan x ns matrix containing capture histories, and x1 and x2 are nan x ns matrices containing occasion-and-individual varying covariate values, a model where capture probability is a function of x1 and
survival probability is a function of x2 can be specified by:
- Code: Select all
F.cjs.estim(capture=~x1,survival=~x2,histories=ch.hist).
In off-line communication with Jeff Laake, he made a good suggestion that I allow non-matrices (i.e., vectors) and factors in the model specification. I have done this in the newest version (v2.2) by including utility functions ivar and tvar. You fit a traditional time-varying CJS model by,
- Code: Select all
time = as.factor(1:ns)
F.cjs.estim(~tvar(time,nan,drop=c(1,2)), ~tvar(time,nan,drop=c(1,nan-1,nan-2)), ch.hist).
The drop= arguments are needed to reduce the number of parameters to those present in the traditional CJS model. By assigning "nan" and "drop.levels" attributes to the time vector, you can shorten model specification to just ~tvar(time) (see help(tvar)). If the drop= arguments are not specified, the model is over-parameterized and mra will report fewer degrees of freedom than estimated coefficients.
A linear time model for captures, and a linear time with additive sex effect for survival is fitted with code similar to
- Code: Select all
F.cjs.estim(~tvar(1:ns,nan), ~ivar(sex,ns)+tvar(1:ns,nan), ch.hist).
The general idea is that once we have covariate matrices or vectors defined, we can concentrate on biology and model selection in a way that is familiar from our regression courses.
The R Environment: A benefit of running an analysis in R is that it is a real programming environment. Data manipulation and model specification can be stored in a program (file) and run with one command. Consequently, re-running (potentially hundreds of) models after data modification is a snap. Plotting is also easy and flexible.
Models in mra:
By no stretch of the imagination does mra do everything that MARK does. At present, mra estimates CJS open models and Huggin's closed models, and does some associated plotting and goodness-of fit testing. I almost always end up estimating population size from CJS models (under appropriate assumptions!), and the CJS routine automatically computes the Horvitz-Thompson estimators and their complicated variances. These two models should cover a wide range of situations, but they do not cover all. I am analyzing multi-strata data at present, so this model should be added in the near future. A colleague has simulation code that I will add in future. See help(mra) for a call for other routines.
Finally, if you are familiar and comfortable with MARK or RMark, I encourage you to stick with them. If you are not familiar with MARK or RMark, or you are not comfortable with them, and you need to estimate a CJS or Huggin's model, I encourage you to give mra a try. With R open, you install mra by clicking on Packages -> Install Package(s)... At that point, a window will ask you to select a repository, then the long list of packages for R will appear in another window. Select mra from the list and it will download and install. Alternatively, execute
- Code: Select all
install.packages("mra", repos="http://cran.r-project.org")
Thanks and as always, comments are welcome. I confess that I do not follow Phi.dot as closely as I should (~1/week), so send me an email if you post something relevant.
Trent McDonald