Pc spec

questions concerning analysis/theory using program MARK

Pc spec

Postby Mark Trinder » Fri Jun 23, 2006 5:10 am

Hi all

just a quick question:
I'm looking for recommendations for the type of PC spec I should look to get to improve run times for MARK analyses?

I know Evan has provided guidance here on minimum specs, and while the concept of a max spec is a bit meaningless, it would be useful to know what I should be aiming for: what spec have others upgraded to and what would they advise given their experiences (e.g. which is more important: more RAM or faster processor - obviously both, but there is going to be a trade-off here)? I also need to be able to communicate this to our IT bods so techy details are good :-)

For background, I am currently running MS models for a 40+ yr dataset with 4000+ individuals (giving an effective sample size of 10789) on a 2G pentium with 1.5G RAM and even models with time included into only one of S,p,Psi can take in excess of 24hrs to run. I have been largely unable to increase model complexity further as R (I am using RMark to build my models) crashes, presumably due to the large design matrices involved?

thanks
Mark
Mark Trinder
 
Posts: 17
Joined: Tue Oct 28, 2003 7:42 am
Location: Slimbridge, UK

Computer spec

Postby jlaake » Fri Jun 23, 2006 11:43 am

With regard to computer specs I can offer the following. Increasing RAM will help with problem size and increasing processor speed will improve run times. I'm running CJS models with 19 years and 8000+ capture histories and I can get runs pushing 12-15hours on a 2.8Ghz machine with 2MB Ram. Multistrata models are that much slower because they have to compute all the transition probabilities so it will depend on the number of states in the design. Using Dual Processors will not help speed up a single run but will let you run 2 jobs simultaneously with the same execution speed so it effectively halves the time in comparison to running the jobs sequentially. I did some comparisons between AMD and Intel processors of similar speed and found little difference. At one point I was misled and thought there were fairly large differences and as it turned out I was using two different versions of MARK.EXE - the more recent one which automatically standardizes the design matrix and an older one that did not. The newer version took considerably longer with the problem I was comparing. Gary sent me a version that allows you to add a switch so that it does not standardize the design matrix but I'm not certain it is in the version posted on his site.

With regard to RMark, I have yet found a case where I crashed R with an RMark model creation. Send me an email with the specs as maybe there is a bug. R is limited to using 1Gb Ram so it probably won't help to go much beyond 2Gb. Here are some thoughts for you. R keeps everything in the workspace in memory, so if your workspace is getting really large you may want to remove any unneeded objects. I have gotten my workspace so large that I was unable to open it after saving it. That is why I started saving the MARK output files in the directory rather than in the workspace. Second, if you have some parameters (like Psi) that are going to be only constant or time-dependent (within group/strata/tostrata) then you can choose that pim.type in the parameter specification of make.design.data instead of using the default which is all-different. That is what Brad Stith is doing with some of his large multi-strata designs. With regard to run times, it should help to use the initial argument to start models close to the MLE. Initial can have as its argument, another model and it will identify the parameters in common. I fit a fairly general model and then use it as starting values for others. Sometimes it helps and sometimes not depending on how close the models are.
jlaake
 
Posts: 1480
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: Computer spec

Postby cooch » Fri Jun 23, 2006 2:57 pm

jlaake wrote:With regard to computer specs I can offer the following. Increasing RAM will help with problem size and increasing processor speed will improve run times. I'm running CJS models with 19 years and 8000+ capture histories and I can get runs pushing 12-15hours on a 2.8Ghz machine with 2MB Ram. Multistrata models are that much slower because they have to compute all the transition probabilities so it will depend on the number of states in the design.


Even worse - you should also be extremely wary of local minima for MS models. Coming up with good starting values can help, but that is non-trivial. Moreover, the numerical 'solution' to the problem (the alternate optimization option in MARK - which is in fact simulated annealing) takes much longer than the standard Newton-Raphson approach (but, on the plus side, I've yet to see it ever converge to anything other than the global minimum). I had one large problem that took ~3 days to finish. And that was for one model. So, be warned. Part of the reason I recently did a massive hardware upgrade.

But, in fairness, I think we get seduced by the speed with which computers give us answers for simple problems, and expect complex problems to be solved just as quickly. Full blown Bayesian random effects models (for example) can take huge amounts of compute time to get decent convergence. So be it. This is not unusual - my colleagues in informatics, and some ares of non-linear physics, have jobs that run for days and days - on massively parallel machines, or serial hot-rods like the classic Cray's. So, perhaps a few hours (or even days) waiting for MARK (or anything else) to finish isn't so bad.
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University


Return to analysis help

Who is online

Users browsing this forum: Bing [Bot] and 1 guest