Parallel simulations for median c-hat?

questions concerning analysis/theory using program MARK

Parallel simulations for median c-hat?

Postby Aline » Wed Jan 20, 2016 1:17 pm

I'm running a median c-hat GOF test in MARK (on a model exported from RMark). It's a complicated model with several states and many occasions and the test is taking a long time to run, even with few simulations. I noticed while it was running that it was only using one core of my multi-core processor. When it got past the simulations to the logistic regression part it used all cores. It seems to me (admittedly, not the most computer-literate person) that the simulations would be perfect candidates for parallel processing. Is there any way to do that?
Aline
 
Posts: 8
Joined: Fri Jun 19, 2015 2:46 pm

Re: Parallel simulations for median c-hat?

Postby cooch » Wed Jan 20, 2016 3:30 pm

Aline wrote:I'm running a median c-hat GOF test in MARK (on a model exported from RMark). It's a complicated model with several states and many occasions and the test is taking a long time to run, even with few simulations.


A colleague (who shall remain nameless, but lives in the Ft. Collins area) has frequently opined that compute time is cheap, relative to the cost of collecting the data, so the issue of something taking 'a long time' (the premise being, too long), is in fact a relative one (wait until you go Bayesian, and it takes days for chains to converge for a *single model* for particularly ugly problems). I digress...

I noticed while it was running that it was only using one core of my multi-core processor. When it got past the simulations to the logistic regression part it used all cores. It seems to me (admittedly, not the most computer-literate person) that the simulations would be perfect candidates for parallel processing. Is there any way to do that?


I can't replicate the observation, but independent of that:

1\ no -- no 'flag to flip' to get MARK to run simulations, one per core (which is more or less what I suspect you're after).

2\ if your model is pretty 'general', you might be able to save yourself some time (at least, initially) by looking at the 'newly implemented' Fletcher c-hat, which is produced 'instantly' (since it is based on simple metrics summing over observed frequencies of the different encounter histories). See section 5.8 of chapter 5. What I would probably do is -- look at the Fletcher c-hat, assume it is pretty close to 'truth', and then start my median c-hat analysis based on that. So, for example, if the Fletcher c-hat is reported as say) 1.25, I'd run the median-chat with an upper-bound of 1.5, lower bound of 1.0, 10 design points, and 25 replicates for each design point. Or some such.

The main caveat to the Fletcher c-hat as a 'robust estimator' is given in the following (pulled out of Chapter 5) - in particular, if you have one or more transition probabilities fixed to zero.

While the Fletcher c-hat shows considerable promise, several problems can cause this estimate to be
incorrect. First, losses on capture or dots in the encounter history will create encounter histories that are not considered in the total number of possible encounter histories. That is, the total number of possible encounter histories is based on no missing data. Second, parameter values that cause a reduction in the total number of encounter histories will bias the chat estimate. Examples of such reductions are an occasion in the CJS data type with p=0, or transition probabilities fixed to 0 or 1 in the multi-state data types.


Having said that, for near-full-time-dependent models, I've done some less-than-exhaustive comparisons of median c-hat and the Fletcher c-hat, and they compare very well. Your results may vary, of course.
cooch
 
Posts: 1652
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: Parallel simulations for median c-hat?

Postby Aline » Thu Jan 21, 2016 6:10 am

wait until you go Bayesian, and it takes days for chains to converge for a *single model* for particularly ugly problems


Been there, done that!

compute time is cheap, relative to the cost of collecting the data


I don't disagree, but that's no reason to waste it. It kind of feels like telling one technician to do a huge job while all the other technicians sit around and watch. So I figured it was at least worth asking.

no 'flag to flip' to get MARK to run simulations, one per core


Thank you. At least now I know I won't find out a few months from now that I "should've just ticked that box."

I was steering away from the Fletcher c-hat precisely because I do have a multistate model with some transitions fixed to zero. I might have a look at it anyway, just to see what it comes up with.
Aline
 
Posts: 8
Joined: Fri Jun 19, 2015 2:46 pm

Re: Parallel simulations for median c-hat?

Postby Aline » Thu Jan 21, 2016 10:03 am

I've been thinking about this and realized that the easiest solution is probably just to run several tests simultaneously (in separate windows) using fewer replicates, tick the box for getting the simulation output in an excel file, and then combine them myself afterwards and run the logistic regression.

At the moment I'm not seeing any reason why this shouldn't work...
Aline
 
Posts: 8
Joined: Fri Jun 19, 2015 2:46 pm

Re: Parallel simulations for median c-hat?

Postby gwhite » Thu Jan 21, 2016 10:52 am

Each numerical optimization should be running in multiple cores if you are running the current version of MARK. The GUI does not use multiple cores, but this part of the median c-hat is generally a small part of the process. So each time the mark32.exe or mark64.exe starts up, you should see multiple threads running on the Task Monitoring window. If not, check what you have set the number of threads to use in the Files | Preferences window.

What you have proposed below will work, but if your numerical optimization benefits from multiple cores, then you really don't need to go to that extreme.

Also, when you have few encounter histories, multiple threads do not speed up the run, but can slow it down. The only place where multiple cores helps in MARK is the processing of the encounter histories loop. When the number of encounter histories is small, there generally is no benefit to splitting them into multiple cores because of the overhead.

Gary
gwhite
 
Posts: 340
Joined: Fri May 16, 2003 9:05 am

Re: Parallel simulations for median c-hat?

Postby Aline » Thu Jan 21, 2016 11:41 am

Thanks for your reply, Gary. I'll definitely have to do some more experimenting with this later. It was only showing a single core in use during my test run. Now I have it split up in separate windows with fewer replicates, and the CPU is being used much more efficiently.

I did check that it was set to use the maximum number of threads, so that wasn't the problem.

I'm going to let it continue running in the way I've set it up now, and I'll see how long it takes compared to what I did before. Once it's done I'll try to play around with it some more to see if I can figure out what's going on.
Aline
 
Posts: 8
Joined: Fri Jun 19, 2015 2:46 pm


Return to analysis help

Who is online

Users browsing this forum: No registered users and 0 guests