Hello! I'm the programmer for the ECOCEAN Whale Shark Photo-identification Library (http://www.whaleshark.org). Our library is a Web-based, collaborative mark-recapture platform for the study of a single species (Rhincodon typus). We have been developing our web-based mark-recapture software since 2003, and along the way we have received a number of requests to apply it to other species. In an effort to allow others to use it, I am working to open source it.
I am posting to ask: would a standard mark-recapture data management platform be of use to the Phi-Dot community?
In a nutshell, our software is a Java web application that can run on a laptop or on a web server. You can use it for very small, single researcher projects, or you can deploy it to a web server and use its built-in security system to coordinate a small or large collaborating team. For whaleshark.org, we rely on a globally distributed network of researchers to contribute data and to process it using a simple set of guidelines posted on a wiki (http://www.whaleshark.org/wiki/doku.php). Location-based security ensures that data is globally visible to all researchers but that permissions to modify data are limited to only relevant members (e.g. Seychelles members cannot edit Maldives data without obtaining higher privileges). Because the software can be deployed to the web, it also allows for public participation, allowing us to collect a lot more data and work with more intricate models.
We built our Java-based software on top of DataNucleus (http://www.datanucleus.org/), which is an object-relational mapping layer. This allows us to use many different types of data sources, from small Excel files, to relational databases, all the way up to Amazon S3. This potentially means that existing mark-recapture databases could be *relatively* easily mapped into and managed by our software. A complete list of types is available from
http://www.datanucleus.org/products/acc ... tores.html. In addition, for relational databases our basic data model is based on the Darwin Core (http://rs.tdwg.org/dwc/terms/), allowing us to easily use TapirLink (http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink) to expose mark-recapture data to larger biodiversity frameworks, such as the GBIF(http://www.gbif.org) and OBIS(http://www.iobis.org/).
A list of available functionality (in a whale shark context) is available here: http://www.ecocean.org/forum/index.php/topic,309.0.html
But as a summary, the platform allows you to:
-Quickly start a new mark-recapture project with existing, easy-to-install software. You can also tailor it (Java programming required) to meet your specific project's needs.
-Access different mark-recapture datastore types (e.g. relational databases, Excel files, etc.) through a common Java API. This could allow for pluggable functionality, such as allowing the community to create and share plug-ins to support data export for specific models for use in MARK, or for customized data mining.
-Manage mark-recapture data through a Web interface and safely collaborate by relying on authentication, authorization, and auditing to control and monitor access.
-Allow the public to collect and contribute data through a web interface.
-Add computer-assisted identification techniques as they become available. In a whale shark context, we use two pattern recognition algorithms to sift through a global database and suggest matches for new data to previously identified whale sharks.
Back to my question: would an open source, Java-based, and generic mark-recapture platform be useful as a community maintained project moving forward? I welcome your feedback. If you would specifically like to try the software when it's ready, please contact me off-list at jason at whaleshark dot org.
Thanks,
Jason Holmberg
ECOCEAN Whale Shark Photo-identification Library
http://www.whaleshark.org