I haven't looked at this specifically in detail, so can't say for certain, but in the original MacKenzie and Bailey 2004 paper, we conducted simulations and in some scenarios some of the expected values were getting really small (<0.005) and most were <2. Given there was no lack of fit, the test procedure had the correct size (alpha level), so it would seem that without using pooling the bootstrap procedure with small expected values doesn't give us too many false rejections of the null hypothesis.
Whether this still holds when there is a lot of missing values (which we didn't assess in the paper) or really sparse data, I don't know. My gut feel is that the bootstrap should do a reasonable job of it provided you have enough bootstrap samples to adequately sample those really rare instances, but perhaps there are occasions when pooling does help. If anyone out there is looking for a Honours project in this general area, this might be a good one.
