Data Cloning Caution

questions concerning analysis/theory using program MARK

Data Cloning Caution

Postby B.K. Sandercock » Thu Sep 01, 2011 11:52 am

One of the new features of Program Mark is the tool for Data Cloning which is a numerical approach for determining if parameters in a model are identifiable or not . For example, CJS models can have nonidentifiable parameters in models with time-dependence in apparent survival (phi) and the encounter rate (p). In a model such as phi(t), p(t) it is not possible to decompose the product of phi and p for the last transition/occasion without additional information to estimate p for the last occasion (pages 3-13, 4-71 of TFM).

The Data Cloning tool can be accessed in Mark through Output | Specific Model Output | Data Cloning. The tool can be applied to any model in the candidate set and the default option for cloning the encounter histories is by a factor of 100.

A couple of observations after working with the tool. First, the useful diagnostic for whether a parameter is estimable is the SE ratio between the SE of the original model vs. the data cloned model. If a factor of 100 is used, the SE ratio for estimable parameters should be ~10, presumably because SE = SD / sqrt(N)?. Nonestimable parameters should have an SE ratio other than 10. For example, in the dipper example with phi(g*t), p(g*t) with the sin link, there is a DIV/0 error message for the nonestimable parameters. Second, the SE ratio is sensitive to which link function is used. For example, the dipper example has a year near the start of the time series where an estimate of p for males is close to the boundary of one. If the model phi(g*t), p(g*t) is run with the sin vs. the logit link, the parameter at the boundary is successfully estimated and tallied with the sin link but not the logit link. Running the data cloning tool on the two models gives different results, an SE ratio of zero under the sin link, and 9.3 under the logit link. Last, one of my students tried the data cloning tool with a closed population model where abundance (N) was included as a estimated parameter. For probabilities like phi and p, data cloning affects the SE ratio but not the point estimates. For N, because data cloning increases the number of encounter histories for marked individuals, the point estimate of N will increase and not just the SE ratio. A caution for the data cloning tool is perhaps it should only be used for parameters that are probabilities bounded 0-1.

Brett K. Sandercock
B.K. Sandercock
 
Posts: 48
Joined: Mon Jun 02, 2003 4:18 pm
Location: Norwegian Institute of Nature Research

Re: Data Cloning Caution

Postby cooch » Thu Sep 01, 2011 12:16 pm

Brett is referring to a feature in MARK which hasn't been documented yet. I have a 'late-beta' chapter for the book documenting the cloning feature (which covers the points Brett raises, plus several other things -- as well as the usual step-by-step details on the 'mechanics'). This chapter should be released for general consumption early next week, if not sooner.
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: Data Cloning Caution

Postby Eurycea » Fri Sep 02, 2011 2:53 pm

On Gary White's website it says:

The confidence intervals can also be compared, and the use of profile likelihood confidence intervals is suggested for examining parameter estimates at boundaries. That is, a parameter at a boundary, e.g., a survival estimate equal to 1, will generally have a zero (or at least unrealistically small) standard error. Cloning the data does not change this small standard error. However, if you have computed profile likelihood confidence intervals for this parameter, the profile likelihood confidence intervals for the cloned data will be considerably shorter (assuming you clone a 100 copies) than the original data. So, data cloning is also useful for verifying that a parameter estimated at the boundary is also estimable.


I don't understand how profile CIs for the cloned data being considerably shorter than the original data verify that a parameter at the boundary is estimable. What does it look like if it is unestimable? Can someone explain the logic here? Thanks
Eurycea
 
Posts: 103
Joined: Thu Feb 25, 2010 11:21 am

Re: Data Cloning Caution

Postby cooch » Fri Sep 02, 2011 4:19 pm

B.K. Sandercock wrote:One of the new features of Program Mark is the tool for Data Cloning which is a numerical approach for determining if parameters in a model are identifiable or not . For example, CJS models can have nonidentifiable parameters in models with time-dependence in apparent survival (phi) and the encounter rate (p). In a model such as phi(t), p(t) it is not possible to decompose the product of phi and p for the last transition/occasion without additional information to estimate p for the last occasion (pages 3-13, 4-71 of TFM).

The Data Cloning tool can be accessed in Mark through Output | Specific Model Output | Data Cloning. The tool can be applied to any model in the candidate set and the default option for cloning the encounter histories is by a factor of 100.


I've just posted a late-draft of the 'chapter' (appendix F, in fact) on data cloning in MARK. You can access it here

http://www.phidot.org/software/mark/doc ... /app_6.pdf

Be advised that data cloning (and associated chapter) is a work in progress, and that you can't expect it to address all problems (with confounding and estimability) for all data types.

A couple of observations after working with the tool. First, the useful diagnostic for whether a parameter is estimable is the SE ratio between the SE of the original model vs. the data cloned model. If a factor of 100 is used, the SE ratio for estimable parameters should be ~10, presumably because SE = SD / sqrt(N)?. Nonestimable parameters should have an SE ratio other than 10. For example, in the dipper example with phi(g*t), p(g*t) with the sin link, there is a DIV/0 error message for the nonestimable parameters.


A fair summary -- much more detail in the chapter.

Second, the SE ratio is sensitive to which link function is used. For example, the dipper example has a year near the start of the time series where an estimate of p for males is close to the boundary of one. If the model phi(g*t), p(g*t) is run with the sin vs. the logit link, the parameter at the boundary is successfully estimated and tallied with the sin link but not the logit link. Running the data cloning tool on the two models gives different results, an SE ratio of zero under the sin link, and 9.3 under the logit link.


The issue of the link function is in fact not particularly an issue. This particular example (the ubiquitous dipper) is considered in the chapter. Basically, the SE ratio is used for identifying 'confounded' parameters, and is not relevant to consideration of the problem of parameters estimated near the boundaries. For that sort of problem, we compare the profile likelihood CI's between the original and cloned data sets, which (as you'll see) is not influenced (that I/we can find) by the choice of link functions.

Last, one of my students tried the data cloning tool with a closed population model where abundance (N) was included as a estimated parameter. For probabilities like phi and p, data cloning affects the SE ratio but not the point estimates. For N, because data cloning increases the number of encounter histories for marked individuals, the point estimate of N will increase and not just the SE ratio. A caution for the data cloning tool is perhaps it should only be used for parameters that are probabilities bounded 0-1.


A good point -- covered in a bit of detail in section F.3 in the chapter.
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: Data Cloning Caution

Postby cooch » Fri Sep 02, 2011 4:20 pm

Eurycea wrote:On Gary White's website it says:

The confidence intervals can also be compared, and the use of profile likelihood confidence intervals is suggested for examining parameter estimates at boundaries. That is, a parameter at a boundary, e.g., a survival estimate equal to 1, will generally have a zero (or at least unrealistically small) standard error. Cloning the data does not change this small standard error. However, if you have computed profile likelihood confidence intervals for this parameter, the profile likelihood confidence intervals for the cloned data will be considerably shorter (assuming you clone a 100 copies) than the original data. So, data cloning is also useful for verifying that a parameter estimated at the boundary is also estimable.


I don't understand how profile CIs for the cloned data being considerably shorter than the original data verify that a parameter at the boundary is estimable. What does it look like if it is unestimable? Can someone explain the logic here? Thanks


For the same reason the variance estimate for a parameter is smaller with bigger sample sizes.
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: Data Cloning Caution

Postby cooch » Fri Sep 02, 2011 6:46 pm

cooch wrote:
Eurycea wrote:On Gary White's website it says:

The confidence intervals can also be compared, and the use of profile likelihood confidence intervals is suggested for examining parameter estimates at boundaries. That is, a parameter at a boundary, e.g., a survival estimate equal to 1, will generally have a zero (or at least unrealistically small) standard error. Cloning the data does not change this small standard error. However, if you have computed profile likelihood confidence intervals for this parameter, the profile likelihood confidence intervals for the cloned data will be considerably shorter (assuming you clone a 100 copies) than the original data. So, data cloning is also useful for verifying that a parameter estimated at the boundary is also estimable.


I don't understand how profile CIs for the cloned data being considerably shorter than the original data verify that a parameter at the boundary is estimable. What does it look like if it is unestimable? Can someone explain the logic here? Thanks


For the same reason the variance estimate for a parameter is smaller with bigger sample sizes.


More specifically, if the parameter is estimable, there will be a maximum to the likelihood which will yield a smaller value for the CI when the data are cloned. If there is no maximum, then the cloning won't change the CI much at all (since the calculation of the profile CI depends on the shape of the likelihood. If the likelihood is, say, tuly flat, then cloning the data won't change it much).

Short form answer.
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: Data Cloning Caution

Postby Eurycea » Thu Sep 08, 2011 10:03 am

Thanks cooch, this is very useful. I can now test those boundary estimates I have!!!
Eurycea
 
Posts: 103
Joined: Thu Feb 25, 2010 11:21 am


Return to analysis help

Who is online

Users browsing this forum: Bing [Bot] and 0 guests

cron