SIN link

posts related to the RMark library, which may not be of general interest to users of 'classic' MARK

SIN link

Postby markmiller » Mon Aug 25, 2014 5:50 am

I am attempting to develop a better understanding of the SIN link used by Program MARK. In particular, I am wondering whether the SIN link can be used with glm in R to estimate proportions and compare proportions among groups (assuming p = 1). The SIN link is not available as a built-in option in glm.

I can find little in the published literature about the SIN link. I have posted a couple of questions on Stack Overflow and Cross Validated, but gotten little feedback.

Where can I read about the SIN link in the published literature? At one time there was a pdf devoted to the statistics of Program MARK. Is that document still available? I suspect the source code of RMARK does not implement the SIN link directly, but rather instructs MARK to implement it. Does someone have R code for implementing the SIN link in glm?

Here are links to the above-mentioned Stack Overflow and Stack Exchange questions if interested:

http://stackoverflow.com/questions/2543 ... regression

http://stats.stackexchange.com/question ... regression

http://stats.stackexchange.com/question ... a-sin-link

Thank you for any guidance. I hope this question is not off-topic.
markmiller
 
Posts: 49
Joined: Fri Nov 08, 2013 6:23 pm

Re: SIN link

Postby markmiller » Tue Aug 26, 2014 6:42 am

Instead of trying to implement the SIN link into glm in R I decided to try to write the likelihood function for the logit link and solve with optim. Then I repeated that approach simply substituting in the SIN link. Below is the R code for both. Both give the correct point estimate to three decimal places. The logit link did a better job. I was expecting better performance from the SIN link than what I observed. Perhaps I made an error somewhere.

I will update this thread if I learn more.


Code: Select all
my.data <- read.table(text='
   y  x
   0  1
   0  1
   1  1
   1  1
   1  1
   1  1
   1  1
   1  1
   1  1
   1  1
', header = TRUE)


# create the design matrices
vY = as.matrix(my.data$y)
mX = as.matrix(my.data$x)

logLikelihoodLogit = function(vBeta, mX, vY) {
  return( -sum( vY * log( (1/(1+exp(-(mX %*% vBeta)))) ) + (1-vY) * log(1 - (1/(1+exp(-(mX %*% vBeta)))) )))
}


logLikelihoodSin = function(vBeta, mX, vY) {
  return( -sum( vY * log( ((sin(mX * vBeta) + 1) / 2)  ) + (1-vY) * log(1 - ((sin(mX * vBeta) + 1) / 2 ) )))
}


# set initial parameter values

vBeta0 = c(0.5) # arbitrary starting parameters



# minimize the negative log-likelihood for logit link

optimLogit = optim(vBeta0, logLikelihoodLogit, mX = mX, vY = vY, method = 'BFGS', hessian=TRUE)
optimLogit$par

# [1] 1.386305

(1/(1+exp(-(optimLogit$par))))

# [1] 0.8000017


# minimize the negative log-likelihood for SIN link

optimSin   = optim(vBeta0, logLikelihoodSin  , mX = mX, vY = vY, method = 'BFGS', hessian=TRUE)
optimSin$par

# [1] 0.6439062

(sin(optimSin$par) + 1) / 2

# [1] 0.800162
markmiller
 
Posts: 49
Joined: Fri Nov 08, 2013 6:23 pm

Re: SIN link

Postby markmiller » Tue Aug 26, 2014 8:57 am

I forgot that when only estimating one parameter it is better to use
Code: Select all
method='Brent'
.

When I do that I obtain better point estimates. Both links now return a point estimate matching the expected value:

Code: Select all
optimLogit.b = optim(vBeta0, logLikelihoodLogit, mX = mX, vY = vY, method='Brent', lower = -20, upper = 20)
optimLogit.b

(1/(1+exp(-(optimLogit.b$par))))
# [1] 0.8

optimSin.b = optim(vBeta0, logLikelihoodSin, mX = mX, vY = vY, method='Brent', lower = -20, upper = 20)
optimSin.b

(sin(optimSin.b$par) + 1) / 2
# [1] 0.8


It might be okay to use
Code: Select all
lower = 0, upper = 1
with
Code: Select all
optimSin.b
.
markmiller
 
Posts: 49
Joined: Fri Nov 08, 2013 6:23 pm

Re: SIN link

Postby jlaake » Tue Aug 26, 2014 11:26 am

It sounds like you have this under control. The sin link and all of the links in RMark are in inverse.link and the derivatives are in deriv.inverse.link. They are used to compute real parameters. Those functions are not exported by RMark so to use or see them type RMark:::inverse.link or RMark:::deriv.inverse.link

--jeff
jlaake
 
Posts: 1480
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: SIN link

Postby markmiller » Tue Aug 26, 2014 11:49 am

Thank you, Jeff. Some heavy hitters over at Cross Validated are asking me why anyone would ever want to use the sine link.

http://stats.stackexchange.com/question ... regression

I am in a little over my head.

I tried suggesting that the sine link performs better than the logit link at boundaries (probability is 0 or 1). But I think that suggestion was dismissed.

I also said the sine link is more-or-less the default in MARK if there are no covariates, i.e., if there is only one entry per row in the design matrix (regardless of whether that matrix is an identity matrix). I agreed that problems could arise if you use the sine link along with covariates.

I think the sine link can be used to compare probabilities among groups provided there is no intercept. For example, you could compare probability of getting a 'head' from three different coins each flipped 20 times (for example, by setting p =1 in MARK).

Anyway, if anyone feels inclined, please feel free to tell the folks at Cross Validated why anyone would ever want to use the sine link! I am not entirely sure I know the answer. I thought I did a few days ago. But I am not sure anymore.
markmiller
 
Posts: 49
Joined: Fri Nov 08, 2013 6:23 pm

Re: SIN link

Postby jlaake » Tue Aug 26, 2014 12:23 pm

Gary is probably the best person to answer this but I don't know if he sees RMark messages. You don't want to use a sin link for many problems. For c-r with an identity DM it works better for counting parameters when some are at boundaries. Beyond that I would not suggest it for std glm problems where you want to use covariates and non-identity DMs.

--jeff
jlaake
 
Posts: 1480
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: SIN link

Postby gwhite » Tue Aug 26, 2014 12:34 pm

The sine link is most useful when parameter estimates are at the boundary, because the 2nd derivative of the likelihood is non-zero for the sine link, whereas you get a zero (within numerical precision) for the links that are asymptotic at the boundary. This behavior shows up when computing the rank of the information matrix to determine the number of parameters that were actually estimated, with the sine link providing a correct answer much more often than the other link functions that constrain parameters between 0 and 1.


MARK uses (sin(beta)+1)/2, but the original reference used sin(beta)^2. See Box, M. J. 1966. A Comparison of Several Current Optimization Methods, and the use of Transformations in Constrained Problems The Computer Journal 9:67–77.

As the MARK documentation discusses, the sine link is generally not useful for continuous covariates because it is not monotonic. Of course, sometimes non-monotonic relationships can be useful.

Gary
gwhite
 
Posts: 340
Joined: Fri May 16, 2003 9:05 am

Re: SIN link

Postby cooch » Tue Aug 26, 2014 2:44 pm

gwhite wrote:As the MARK documentation discusses, the sine link is generally not useful for continuous covariates because it is not monotonic. Of course, sometimes non-monotonic relationships can be useful.


See the MARK helpfile, and a longer treatment of this subject in the MARK book -- - sidebar - starting on p. 22 of chapter 6.
cooch
 
Posts: 1654
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University


Return to RMark

Who is online

Users browsing this forum: No registered users and 1 guest