www.phidot.org

by **markmiller** » Fri Dec 13, 2013 11:14 pm

I am using the robust design occupancy model with initial occupancy, annual colonization (gamma) and annual extinction (epsilon) and a data set containing a large number of years and 550 individual covariates. When I try to run the model psi(.)eps(.)gam(.)p(year), which does not include any of the individual covariates present in the data set, Program Mark returns the error ‘Encounter history was too short for the number of occasions specified’.

I think I have isolated a line in the data set (Site 148) that is generating this error. However, nothing about that line seems unusual or incorrect.

If I restrict the data to just 12 sites (Sites 138-149) the above error is generated. If I delete Site 148 from that set of 12 sites the above error is not generated and the model runs.

If I remove all of the individual covariates from the data set, and restrict myself to the above 12 sites, now including Site 148, without changing any of their capture histories, no error is generated and the model runs.

If I round a subset of the covariate data just for Site 148 no error is generated and the model runs. This rounding removes one decimal place, changes the covariate data from, for example, 5.32 to 5.3.

I started thinking perhaps there is a limit to the number of characters allowed in a line of data. However, the number of characters in the line for Site 148 is the same as for surrounding lines (1534 characters not counting spaces) prior to rounding.

I am also analyzing the same data with RMark and do not obtain this error when using that R package.

Thank you for any suggestions on what might be causing the above error and how to correct it. For now perhaps I could a posteriori apply the rounding solution described above and verify that RMark returns the same estimates. I can provide additional information if that would be helpful.

by **jlaake** » Sat Dec 14, 2013 8:57 am

You need to understand that RMark only passes to mark.exe the covariate data that it uses in the model. So if you fit the dot model, none of the covariate data are passed in the inp file to mark.exe which would effectively remove any problem in those data if one exists. --jeff

by **markmiller** » Sun Dec 15, 2013 3:10 am

Thank you, Jeff. I think your comments help me understand why the error only appears when I use Program MARK directly instead of calling Program Mark via RMark.

I have continued working on trying to isolate the problem that generates the error. At this point I am starting to think the error arises if the number of decimal places in the covariate data varies within a line within a given covariate.

Below is some R code that generates two artificial data sets. The error arises with the first data set, but not the second data set. When I posted this late yesterday I thought perhaps the only difference between the data sets was that the second data set uses trailing zeros within a line within a covariate. However, after looking at the two data sets again neither appear to be using trailing zeros. So, I still do not know why one of the data sets generates an error and the other does not.

Nevertheless, the R code below should enable anyone who wishes to generate the data sets and reproduce the error in Program MARK using 110 visits, 55 seasons and 550 individual covariates.

I will return to this problem shortly and post an update if I learn more.

Code: Select all: # change working directory as appropriate setwd('c:/users/mmiller21/RMark/Oct_2013/') ################################################################## set.seed(1234) n.rows <- 12 n.cols <- 110 prob.1s <- 0.5 # generate capture histories histories <- t(replicate(n.rows, rbinom(n.cols, 1, prob.1s))) head(histories) a.a <- as.matrix(apply(format(histories), 1, paste, collapse=""), nrow = n.rows) head(a.a) # generate counts a.b <- rep(1, n.rows) # generate covariates a.c <- matrix(0, nrow=n.rows, ncol=550) # substitute in non-zero covariates for Site 148 s148 <- c(-15.12, -15.12, -15.1 , -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.1 , -15.1 , -15.1 , -15.12, -15 , -15.12, -15.1 , -15.12, -15.12, -15.12, -15.1 , -15.12, -15.1 , -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.1 , -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.1 , -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.1 , -15.1 , -15.1 , -15.12, -15 , -15.12, -15.1 , -15.12, -15.12, -15.12, -15.1 , -15.12, -15.1 , -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.1 , -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, -15.12, 0 , 0.5 , 0.56, 0.56, 0.56, 0.56, 0.56, 0.5 , 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.5 , 0.56, 0.56, 0.56, 0.56, 0.56, 0.5 , 0.56, 0.56, 0.56, 0.5 , 0.56, 0.56, 0.56, 0.56, 0.5 , 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.5 , 0.5 , 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.5 , 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.5 , 0.56, 0.56, 0.56, 0.56, 0.56, 0.5 , 0.56, 0.56, 0.56, 0.5 , 0.56, 0.56, 0.56, 0.56, 0.5 , 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.5 , 0.5 , 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56, 0) a.c[11,1:(55*4)] <- s148 # generate end of line marker a.d <- rep(';', n.rows) # generate Program MARK input file a5 <- data.frame(a.a, a.b, a.c, a.d) dim(a5) # write Program MARK input file write.table(a5, 'artificialA.inp', sep=' ', col.names=FALSE, row.names=FALSE, quote=FALSE) ################################################################## set.seed(1234) n.rows <- 12 n.cols <- 110 prob.1s <- 0.5 # generate capture histories histories <- t(replicate(n.rows, rbinom(n.cols, 1, prob.1s))) head(histories) a.a <- as.matrix(apply(format(histories), 1, paste, collapse=""), nrow = n.rows) head(a.a) # generate counts a.b <- rep(1, n.rows) # generate covariates a.c <- matrix(0, nrow=n.rows, ncol=550) # substitute in non-zero covariates for Site 148 # always using 2 decimal places s148 <- c(rep(-15.12, (55*2)), rep(0.56, (55*2-1)), 0) s148[sample((55*2) , 5, replace = FALSE)] <- -15.1 s148[sample((55*2) , 5, replace = FALSE)] <- -15 s148[sample((55*3):(55*4-1), 5, replace = FALSE)] <- 0.5 a.c[11,1:(55*4)] <- s148 # generate end of line marker a.d <- rep(';', n.rows) # generate Program MARK input file a5 <- data.frame(a.a, a.b, a.c, a.d) dim(a5) # write Program MARK input file write.table(a5, 'artificialB.inp', sep=' ', col.names=FALSE, row.names=FALSE, quote=FALSE) ##################################################################

by **cooch** » Sun Dec 15, 2013 11:16 am

markmiller wrote:Thank you, Jeff. I think your comments help me understand why the error only appears when I use Program MARK directly instead of calling Program Mark via RMark.

Extremely unlikely that something works with RMark that doesn't work in standard MARK. In fact, strong prior that you're making a mistake using standard MARK.

I have continued working on trying to isolate the problem that generates the error. At this point I am starting to think the error arises if the number of decimal places in the covariate data varies within a line within a given covariate.

Not likely. I took any number of files with individual covariates, randomly truncated the number of digits after the decimal place, and MARK processed things without any problem (although the results changed, of course).

Nevertheless, the R code below should enable anyone who wishes to generate the data sets and reproduce the error in Program MARK using 110 visits, 55 seasons and 550 individual covariates.

Please reduce the problem to something tractable first. No one is going to try to replicate your problem with something on that scale. You claim that MARK can't handle covariates of differing string lengths. So, 'prove it' with a smaller example, and one which isn't completely artificial (i.e., only one record has non-zero covariates?). Try cutting it down to something simpler -- say, 11 visits, 5 seasons, 55 covariates.

Further, with only 12 lines in the file, MARK will compalin that this isn't a realistic data set.

Moreover, I just tried your ArtificalB example (despite how weird it is), and it ran fine in classic MARK.

by **jlaake** » Sun Dec 15, 2013 11:20 am

Evan-

I think you must have missed my post in regard to RMark. RMark only outputs covariate data that are used in the model, so when he specified the dot model no covariate data were sent to mark.exe, so there was no problem in reading the data as a result.

--jeff

by **markmiller** » Sun Dec 15, 2013 11:52 am

Thanks, Evan. Yes, B does run. A is the one that does not run. Perhaps I made a typo when writing the post.

As for reducing the number of visits and covariates in the example, I can try that, but I suspect the number of visits and/or covariates might be somehow contributing to the error. That is why I chose instead to reduce the number of sites. Using a small number of sites does cause a warning, but I was not worried about that.

Nevertheless, I will continue to work on this later today and can try to adopt some of your suggestions in my next update.

by **cooch** » Sun Dec 15, 2013 5:44 pm

markmiller wrote:Thanks, Evan. Yes, B does run. A is the one that does not run. Perhaps I made a typo when writing the post.

As for reducing the number of visits and covariates in the example, I can try that, but I suspect the number of visits and/or covariates might be somehow contributing to the error. That is why I chose instead to reduce the number of sites. Using a small number of sites does cause a warning, but I was not worried about that.

Nevertheless, I will continue to work on this later today and can try to adopt some of your suggestions in my next update.

Sure, because you've probably exceeded MARK's limits on number of occasions/covariates. Thats why I suggest trying something smaller. If you can demonstrate that number of significant digits in a covariate is the problem (again, unlikely, since I simulated a data set with 50 individual covariates of varying length after the decimal point, and MARK ran fine). Only Gary can answer for sure, but I'd bet significant money that you've exceeded the number of covariates MARK will allow (the error message you report is typical of what you see if MARK wraps the end of an input line before the semi-colon, which it would do if the line is > max).

by **gwhite** » Sun Dec 15, 2013 5:48 pm

I suspect that the problem is the line length in the input file. There is no upper limit on the number of occasions, covariates, or groups, except for computer memory. I suggest you break your lines up into smaller pieces. Have you tried the "List Data" option on the run window to see which line is causing the problem?

by **gwhite** » Sun Dec 15, 2013 11:44 pm

Your problem is caused by the very large record length (line length) of the encounter histories file. If you open the .inp file with NotePad, you'll see that Windows forces the lines to wrap, even though you have not specified this option in NotePad. Further, if you insert a blank in front of the offending wrapped line, and save the file, then it works fine when you replace the encounter histories in MARK.

by **cooch** » Mon Dec 16, 2013 9:34 am

So, my 'best guess' was partially correct:

...the error message you report is typical of what you see if MARK wraps the end of an input line before the semi-colon, which it would do if the line is > max).

This seems to be a weird interaction of GNU FORTRAN (gfortran) and Windows. There are a couple of solutions -- the one Gary mentioned, or, alternatively, routinely breaking your input file using comment delimiters. So, instead of

Code: Select all: /* 11 */ 10110101 1 52.6 -45 -55.8 56.7 52.6 -45 -55.1 56.7 52.3 -45 -55.8 56.7;

you could also use

Code: Select all: /* 11 */ 10110101 1 52.6 -45 -55.8 56.7 52.6 -45 /* 11 */ -55.1 56.7 52.3 -45 -55.8 56.7;

Doing this with Mark's example .inp file works fine, and MARK processes it perfectly. It is fairly easy in SAS (which I know) to handle this sort of 'line breaking with comments'. I suspect it is also straightforward in R (which I don't know).

At some point, a fix will be figured out. You're only likely to run into this bug (and yes, it was a real bug -- not in MARK, but in using gfortran and Windows) for *really* long lines in the .inp file (not entirely sure what constitutes 'really long' though, but Mark's example clearly hit and went past that point).

www.phidot.org

Error with large number of individual covariates

Error with large number of individual covariates

Re: Error with large number of individual covariates

Re: Error with large number of individual covariates

Re: Error with large number of individual covariates

Re: Error with large number of individual covariates

Re: Error with large number of individual covariates

Re: Error with large number of individual covariates

Re: Error with large number of individual covariates

Re: Error with large number of individual covariates

Re: Error with large number of individual covariates

Who is online