www.phidot.org

by **markmiller** » Mon Aug 25, 2014 5:50 am

I am attempting to develop a better understanding of the SIN link used by Program MARK. In particular, I am wondering whether the SIN link can be used with glm in R to estimate proportions and compare proportions among groups (assuming p = 1). The SIN link is not available as a built-in option in glm.

I can find little in the published literature about the SIN link. I have posted a couple of questions on Stack Overflow and Cross Validated, but gotten little feedback.

Where can I read about the SIN link in the published literature? At one time there was a pdf devoted to the statistics of Program MARK. Is that document still available? I suspect the source code of RMARK does not implement the SIN link directly, but rather instructs MARK to implement it. Does someone have R code for implementing the SIN link in glm?

Here are links to the above-mentioned Stack Overflow and Stack Exchange questions if interested:

http://stackoverflow.com/questions/2543 ... regression

http://stats.stackexchange.com/question ... regression

http://stats.stackexchange.com/question ... a-sin-link

Thank you for any guidance. I hope this question is not off-topic.

by **markmiller** » Tue Aug 26, 2014 6:42 am

Instead of trying to implement the SIN link into glm in R I decided to try to write the likelihood function for the logit link and solve with optim. Then I repeated that approach simply substituting in the SIN link. Below is the R code for both. Both give the correct point estimate to three decimal places. The logit link did a better job. I was expecting better performance from the SIN link than what I observed. Perhaps I made an error somewhere.

I will update this thread if I learn more.

Code: Select all: my.data <- read.table(text=' y x 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ', header = TRUE) # create the design matrices vY = as.matrix(my.data$y) mX = as.matrix(my.data$x) logLikelihoodLogit = function(vBeta, mX, vY) { return( -sum( vY * log( (1/(1+exp(-(mX %*% vBeta)))) ) + (1-vY) * log(1 - (1/(1+exp(-(mX %*% vBeta)))) ))) } logLikelihoodSin = function(vBeta, mX, vY) { return( -sum( vY * log( ((sin(mX * vBeta) + 1) / 2) ) + (1-vY) * log(1 - ((sin(mX * vBeta) + 1) / 2 ) ))) } # set initial parameter values vBeta0 = c(0.5) # arbitrary starting parameters # minimize the negative log-likelihood for logit link optimLogit = optim(vBeta0, logLikelihoodLogit, mX = mX, vY = vY, method = 'BFGS', hessian=TRUE) optimLogit$par # [1] 1.386305 (1/(1+exp(-(optimLogit$par)))) # [1] 0.8000017 # minimize the negative log-likelihood for SIN link optimSin = optim(vBeta0, logLikelihoodSin , mX = mX, vY = vY, method = 'BFGS', hessian=TRUE) optimSin$par # [1] 0.6439062 (sin(optimSin$par) + 1) / 2 # [1] 0.800162

by **markmiller** » Tue Aug 26, 2014 8:57 am

I forgot that when only estimating one parameter it is better to use

Code: Select all: method='Brent'

.

When I do that I obtain better point estimates. Both links now return a point estimate matching the expected value:

Code: Select all: optimLogit.b = optim(vBeta0, logLikelihoodLogit, mX = mX, vY = vY, method='Brent', lower = -20, upper = 20) optimLogit.b (1/(1+exp(-(optimLogit.b$par)))) # [1] 0.8 optimSin.b = optim(vBeta0, logLikelihoodSin, mX = mX, vY = vY, method='Brent', lower = -20, upper = 20) optimSin.b (sin(optimSin.b$par) + 1) / 2 # [1] 0.8

It might be okay to use

Code: Select all: lower = 0, upper = 1

with

Code: Select all: optimSin.b

.

by **jlaake** » Tue Aug 26, 2014 11:26 am

It sounds like you have this under control. The sin link and all of the links in RMark are in inverse.link and the derivatives are in deriv.inverse.link. They are used to compute real parameters. Those functions are not exported by RMark so to use or see them type RMark:::inverse.link or RMark:::deriv.inverse.link

--jeff

by **markmiller** » Tue Aug 26, 2014 11:49 am

Thank you, Jeff. Some heavy hitters over at Cross Validated are asking me why anyone would ever want to use the sine link.

http://stats.stackexchange.com/question ... regression

I am in a little over my head.

I tried suggesting that the sine link performs better than the logit link at boundaries (probability is 0 or 1). But I think that suggestion was dismissed.

I also said the sine link is more-or-less the default in MARK if there are no covariates, i.e., if there is only one entry per row in the design matrix (regardless of whether that matrix is an identity matrix). I agreed that problems could arise if you use the sine link along with covariates.

I think the sine link can be used to compare probabilities among groups provided there is no intercept. For example, you could compare probability of getting a 'head' from three different coins each flipped 20 times (for example, by setting p =1 in MARK).

Anyway, if anyone feels inclined, please feel free to tell the folks at Cross Validated why anyone would ever want to use the sine link! I am not entirely sure I know the answer. I thought I did a few days ago. But I am not sure anymore.

by **jlaake** » Tue Aug 26, 2014 12:23 pm

Gary is probably the best person to answer this but I don't know if he sees RMark messages. You don't want to use a sin link for many problems. For c-r with an identity DM it works better for counting parameters when some are at boundaries. Beyond that I would not suggest it for std glm problems where you want to use covariates and non-identity DMs.

--jeff

by **gwhite** » Tue Aug 26, 2014 12:34 pm

The sine link is most useful when parameter estimates are at the boundary, because the 2nd derivative of the likelihood is non-zero for the sine link, whereas you get a zero (within numerical precision) for the links that are asymptotic at the boundary. This behavior shows up when computing the rank of the information matrix to determine the number of parameters that were actually estimated, with the sine link providing a correct answer much more often than the other link functions that constrain parameters between 0 and 1.

MARK uses (sin(beta)+1)/2, but the original reference used sin(beta)^2. See Box, M. J. 1966. A Comparison of Several Current Optimization Methods, and the use of Transformations in Constrained Problems The Computer Journal 9:67–77.

As the MARK documentation discusses, the sine link is generally not useful for continuous covariates because it is not monotonic. Of course, sometimes non-monotonic relationships can be useful.

Gary

by **cooch** » Tue Aug 26, 2014 2:44 pm

gwhite wrote:As the MARK documentation discusses, the sine link is generally not useful for continuous covariates because it is not monotonic. Of course, sometimes non-monotonic relationships can be useful.

See the MARK helpfile, and a longer treatment of this subject in the MARK book -- - sidebar - starting on p. 22 of chapter 6.

www.phidot.org

SIN link

SIN link

Re: SIN link

Re: SIN link

Re: SIN link

Re: SIN link

Re: SIN link

Re: SIN link

Re: SIN link

Who is online