Linear regression

In linear regression, we considered models of the form \[y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i}+ ... + \beta_px_{pi} + \epsilon_i\] where \(\epsilon_i \sim N(0, \hat{\sigma}^2)\) Alternatively, we can write \[\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1x_{1i} + \hat{\beta}_2x_{2i}+ ... + \hat{\beta}_px_{pi}\] where \(Y_i \sim N(\hat{y}_i, \hat{\sigma}^2)\) For linear regression, each of the responses is assumed to be from a normal distribution with mean equal to the expected value and a common variance estimated by \(\hat{\sigma}^2\) . Also, the response is continuous and can potentially take on any real value.

In many situations the assumptions of the linear regression model will not be appropriate. One such case is in modeling a dichotomous response (a response that can only take on two possible values).

Logistic Regression

In such cases we would instead prefer to model the probability of observing one of the possible outcomes. We can code our dichotomous response variable as 0’s and 1’s and model the probability π of being in case 1 (or case 0 if we prefer) given the predictors. Our binary response is then distributed according to a Bernoulli distribution \[Y_i \sim Ber(\hat{\pi}_i)\] and the expected value of \(Y_i\) is \(\hat{y}_i = \hat{\pi}_i\)

The odds of an event is the probability that an event happens divided by the probability that event doesn’t happen. In the logistic regression model, we model the log odds as a linear function \[log\big(\frac{\hat{\pi}_i}{1-\hat{\pi}_i}\big) = \hat{\beta}_0 + \hat{\beta}_1x_{1i} + \hat{\beta}_2x_{2i}+ ... + \hat{\beta}_px_{pi}\] The expected odds would then be given by \[\frac{\hat{\pi}_i}{1-\hat{\pi}_i} = exp\big(\hat{\beta}_0 + \hat{\beta}_1x_{1i} + \hat{\beta}_2x_{2i}+ ... + \hat{\beta}_px_{pi}\big)\] Thus the estimated probability is \[\hat{\pi}_i = \frac{exp\big(\hat{\beta}_0 + \hat{\beta}_1x_{1i} + \hat{\beta}_2x_{2i}+ ... + \hat{\beta}_px_{pi}\big)}{1+exp\big(\hat{\beta}_0 + \hat{\beta}_1x_{1i} + \hat{\beta}_2x_{2i}+ ... + \hat{\beta}_px_{pi}\big)}\] where the \(exp\) is the exponential function Several things need to notice:

If we have counts of events out of a possible number of trial (e.g. if we count the number of heads out of n coin tosses), our counts would follow binomial distributions instead of Bernoulli (recall that Bernoulli is a binomial with n=1), and we would now have \[Y_i \sim Bin(n_i,\hat{\pi}_i)\] and the expected value of \(Y_i\) is \(\hat{y}_i = n_i\hat{\pi}_i\)

############################################################################ 
# South African Heart Disease Data 
############################################################################ 
# The Coronary Risk‐Factor Study data involve 462 males between the ages of 15 and 64 from 
# a heart‐disease high‐risk region of the Western Cape, South Africa.
# The response is "chd", the presence (chd=1) or absence (chd=0) of coronary heart disease. 
# There are 9 covariates:
# sbp:  systolic blood pressure cumulative tobacco (kg)
# tobacco:  cumulative tobacco (kg)
# ldl:  low densiity lipoprotein cholesterol
# adiposity
# famhist:  family history of heart disease (Present, Absent)
# typea:  type‐A behavior
# obesity
# alcohol:  current alcohol consumption
# age:  age at onset

Heart = read.table('https://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/SAheart.data',
                  sep=",",head=T,row.names=1)
attach(Heart)
head(Heart)
HeartFull = glm(chd~., data=Heart, family=binomial)
summary(HeartFull)
HeartStep = step(HeartFull, scope=list(upper=~., lower=~1), k=2)
summary(HeartStep)
phat=HeartStep$fitted.values
mypred = (phat>0.5)
table(Heart$chd, mypred)
error=(46+73)/nrow(Heart)
error
LS0tCnRpdGxlOiAiU1RBVDM2MTJfMjMxMl9EQVRBTUlOSU5HXzdUSF9UVVRPUklBTCIKYXV0aG9yOiAiWU9VIEppYSIKZGF0ZTogIjMvMjAvMjAxNyIKb3V0cHV0OgogIGh0bWxfZG9jdW1lbnQ6CiAgICBoaWdobGlnaHQ6IHRhbmdvCiAgICBudW1iZXJfc2VjdGlvbnM6IHllcwogICAgdGhlbWU6IHBhcGVyCiAgICB0b2M6IHllcwogICAgdG9jX2RlcHRoOiAzCiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdAotLS0KIyBMaW5lYXIgcmVncmVzc2lvbgoKSW4gbGluZWFyIHJlZ3Jlc3Npb24sIHdlIGNvbnNpZGVyZWQgbW9kZWxzIG9mIHRoZSBmb3JtCiQkeV9pID0gXGJldGFfMCArIFxiZXRhXzF4X3sxaX0gKyBcYmV0YV8yeF97Mml9KyAuLi4gKyBcYmV0YV9weF97cGl9ICsgXGVwc2lsb25faSQkCndoZXJlICRcZXBzaWxvbl9pIFxzaW0gTigwLCBcaGF0e1xzaWdtYX1eMikkCkFsdGVybmF0aXZlbHksIHdlIGNhbiB3cml0ZSAKJCRcaGF0e3l9X2kgPSBcaGF0e1xiZXRhfV8wICsgXGhhdHtcYmV0YX1fMXhfezFpfSArIFxoYXR7XGJldGF9XzJ4X3syaX0rIC4uLiArIFxoYXR7XGJldGF9X3B4X3twaX0kJAp3aGVyZSAkWV9pIFxzaW0gTihcaGF0e3l9X2ksIFxoYXR7XHNpZ21hfV4yKSQKRm9yIGxpbmVhciByZWdyZXNzaW9uLCBlYWNoIG9mIHRoZSByZXNwb25zZXMgaXMgYXNzdW1lZCB0byBiZSBmcm9tIGEgbm9ybWFsIGRpc3RyaWJ1dGlvbiB3aXRoIG1lYW4gZXF1YWwgdG8gdGhlIGV4cGVjdGVkIHZhbHVlIGFuZCBhIGNvbW1vbiB2YXJpYW5jZSBlc3RpbWF0ZWQgYnkgJFxoYXR7XHNpZ21hfV4yJCAuIEFsc28sIHRoZSByZXNwb25zZSBpcyBjb250aW51b3VzIGFuZCBjYW4gcG90ZW50aWFsbHkgdGFrZSBvbiBhbnkgcmVhbCB2YWx1ZS4KCkluIG1hbnkgc2l0dWF0aW9ucyB0aGUgYXNzdW1wdGlvbnMgb2YgdGhlIGxpbmVhciByZWdyZXNzaW9uIG1vZGVsIHdpbGwgbm90IGJlIGFwcHJvcHJpYXRlLiBPbmUgc3VjaCBjYXNlIGlzIGluIG1vZGVsaW5nIGEgZGljaG90b21vdXMgcmVzcG9uc2UgKGEgcmVzcG9uc2UgdGhhdCBjYW4gb25seSB0YWtlIG9uIHR3byBwb3NzaWJsZSB2YWx1ZXMpLiAKCiMgTG9naXN0aWMgUmVncmVzc2lvbgpJbiBzdWNoIGNhc2VzIHdlIHdvdWxkIGluc3RlYWQgcHJlZmVyIHRvIG1vZGVsIHRoZSBwcm9iYWJpbGl0eSBvZiBvYnNlcnZpbmcgb25lIG9mIHRoZSBwb3NzaWJsZSBvdXRjb21lcy4gV2UgY2FuIGNvZGUgb3VyIGRpY2hvdG9tb3VzIHJlc3BvbnNlIHZhcmlhYmxlIGFzIDDigJlzIGFuZCAx4oCZcyBhbmQgbW9kZWwgdGhlIHByb2JhYmlsaXR5IM+AIG9mIGJlaW5nIGluIGNhc2UgMSAob3IgY2FzZSAwIGlmIHdlIHByZWZlcikgZ2l2ZW4gdGhlIHByZWRpY3RvcnMuIE91ciBiaW5hcnkgcmVzcG9uc2UgaXMgdGhlbiBkaXN0cmlidXRlZCBhY2NvcmRpbmcgdG8gYSBCZXJub3VsbGkgZGlzdHJpYnV0aW9uCiQkWV9pIFxzaW0gQmVyKFxoYXR7XHBpfV9pKSQkCmFuZCB0aGUgZXhwZWN0ZWQgdmFsdWUgb2YgJFlfaSQgaXMgJFxoYXR7eX1faSA9IFxoYXR7XHBpfV9pJAoKVGhlIG9kZHMgb2YgYW4gZXZlbnQgaXMgdGhlIHByb2JhYmlsaXR5IHRoYXQgYW4gZXZlbnQgaGFwcGVucyBkaXZpZGVkIGJ5IHRoZSBwcm9iYWJpbGl0eSB0aGF0IGV2ZW50CmRvZXNu4oCZdCBoYXBwZW4uIEluIHRoZSBsb2dpc3RpYyByZWdyZXNzaW9uIG1vZGVsLCB3ZSBtb2RlbCB0aGUgbG9nIG9kZHMgYXMgYSBsaW5lYXIgZnVuY3Rpb24KJCRsb2dcYmlnKFxmcmFje1xoYXR7XHBpfV9pfXsxLVxoYXR7XHBpfV9pfVxiaWcpID0gXGhhdHtcYmV0YX1fMCArIFxoYXR7XGJldGF9XzF4X3sxaX0gKyBcaGF0e1xiZXRhfV8yeF97Mml9KyAuLi4gKyBcaGF0e1xiZXRhfV9weF97cGl9JCQKVGhlIGV4cGVjdGVkIG9kZHMgd291bGQgdGhlbiBiZSBnaXZlbiBieQokJFxmcmFje1xoYXR7XHBpfV9pfXsxLVxoYXR7XHBpfV9pfSA9IGV4cFxiaWcoXGhhdHtcYmV0YX1fMCArIFxoYXR7XGJldGF9XzF4X3sxaX0gKyBcaGF0e1xiZXRhfV8yeF97Mml9KyAuLi4gKyBcaGF0e1xiZXRhfV9weF97cGl9XGJpZykkJApUaHVzIHRoZSBlc3RpbWF0ZWQgcHJvYmFiaWxpdHkgaXMgCiQkXGhhdHtccGl9X2kgPSBcZnJhY3tleHBcYmlnKFxoYXR7XGJldGF9XzAgKyBcaGF0e1xiZXRhfV8xeF97MWl9ICsgXGhhdHtcYmV0YX1fMnhfezJpfSsgLi4uICsgXGhhdHtcYmV0YX1fcHhfe3BpfVxiaWcpfXsxK2V4cFxiaWcoXGhhdHtcYmV0YX1fMCArIFxoYXR7XGJldGF9XzF4X3sxaX0gKyBcaGF0e1xiZXRhfV8yeF97Mml9KyAuLi4gKyBcaGF0e1xiZXRhfV9weF97cGl9XGJpZyl9JCQKd2hlcmUgdGhlICRleHAkIGlzIHRoZSBleHBvbmVudGlhbCBmdW5jdGlvbgpTZXZlcmFsIHRoaW5ncyBuZWVkIHRvIG5vdGljZToKCiogV2Ugbm8gbG9uZ2VyIGhhdmUgYW4gYXNzdW1wdGlvbiBvZiBub3JtYWxpdHkKCiogRXN0aW1hdGlvbiBmb3IgdGhlICRcaGF0e1xiZXRhfV9qJCB3aWxsIGNvbWUgZnJvbSBtYXhpbXVtIGxpa2VsaWhvb2QgZXN0aW1hdGlvbiBhc3N1bWluZyBCZXJub3VsbGkKKG9yIGJpbm9taWFsKSBkaXN0cmlidXRlZCByZXNwb25zZXMKCiogVGhlICRcaGF0e1xwaX1faSQgdmFsdWVzIHdpbGwgZmFsbCBiZXR3ZWVuIDAgYW5kIDEKCiogSG9sZGluZyBhbGwgb3RoZXIgcHJlZGljdG9yIHZhcmlhYmxlcyBjb25zdGFudCwgYSBjaGFuZ2Ugb2YgMSBpbiB0aGUgJGpfdGgkIHByZWRpY3RvciB3aWxsIHJlc3VsdCBpbiBhCmNoYW5nZSBvZiAkZXhwKFxoYXR7XGJldGF9X2opJCBpbiB0aGUgb2RkcwoKKiBUaGUgb2RkcyByYXRpbyBiZXR3ZWVuIHR3byBzZXRzIG9mIGNvbmRpdGlvbnMgKHZhbHVlcyBmb3IgdGhlIHByZWRpY3RvcnMgdmFyaWFibGVzKSB3aWxsIHRlbGwgdXMgaG93IG11Y2ggZ3JlYXRlciAob3IgbGVzcykgdGhlIG9kZHMgb2YgYW4gZXZlbnQgaXMgdW5kZXIgb25lIHNldCBvZiBjb25kaXRpb25zIHRoYW4gdW5kZXIgYW5vdGhlciAoZS5nLiBob3cgbWFueSB0aW1lcyBncmVhdGVyIGlzIHRoZSBvZGRzIGZvciBmZW1hbGVzIHRvIGVuY291bnRlciBhIHNwZWNpZmljIGV2ZW50IHRoYW4gZm9yIG1hbGVzLCBvciBob3cgbXVjaCBncmVhdGVyIGlzIHRoZSBvZGRzIGZvciBhbiBldmVudCB0byBoYXBwZW4gaWYgYSBjb250aW51b3VzIHByZWRpY3RvciBpbmNyZWFzZXMgYnkgMSkKCiogQ29uZmlkZW5jZSBpbnRlcnZhbHMgZm9yIHRoZSAkXGhhdHtcYmV0YX1faiQgd2lsbCBiZSBzeW1tZXRyaWMsIGJ1dCBjb25maWRlbmNlIGludGVydmFscyBmb3IgdGhlIG9kZHMgYW5kIG9kZHMgcmF0aW9zIHdpbGwgbm90IGJlIHN5bW1ldHJpYyBiZWNhdXNlIHRoZXkgYXJlIGJhc2VkIG9uIGV4cG9uZW50aWFscyBvZiB0aGUgc3ltbWV0cmljIGludGVydmFscy4KCklmIHdlIGhhdmUgY291bnRzIG9mIGV2ZW50cyBvdXQgb2YgYSBwb3NzaWJsZSBudW1iZXIgb2YgdHJpYWwgKGUuZy4gaWYgd2UgY291bnQgdGhlIG51bWJlciBvZiBoZWFkcyBvdXQgb2YgbiBjb2luIHRvc3NlcyksIG91ciBjb3VudHMgd291bGQgZm9sbG93IGJpbm9taWFsIGRpc3RyaWJ1dGlvbnMgaW5zdGVhZCBvZiBCZXJub3VsbGkgKHJlY2FsbCB0aGF0IEJlcm5vdWxsaSBpcyBhIGJpbm9taWFsIHdpdGggbj0xKSwgYW5kIHdlIHdvdWxkIG5vdyBoYXZlCiQkWV9pIFxzaW0gQmluKG5faSxcaGF0e1xwaX1faSkkJAphbmQgdGhlIGV4cGVjdGVkIHZhbHVlIG9mICRZX2kkIGlzICRcaGF0e3l9X2kgPSBuX2lcaGF0e1xwaX1faSQKCgpgYGB7cn0KIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyAKIyBTb3V0aCBBZnJpY2FuIEhlYXJ0IERpc2Vhc2UgRGF0YSAKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyAKIyBUaGUgQ29yb25hcnkgUmlza+KAkEZhY3RvciBTdHVkeSBkYXRhIGludm9sdmUgNDYyIG1hbGVzIGJldHdlZW4gdGhlIGFnZXMgb2YgMTUgYW5kIDY0IGZyb20gCiMgYSBoZWFydOKAkGRpc2Vhc2UgaGlnaOKAkHJpc2sgcmVnaW9uIG9mIHRoZSBXZXN0ZXJuIENhcGUsIFNvdXRoIEFmcmljYS4KIyBUaGUgcmVzcG9uc2UgaXMgImNoZCIsIHRoZSBwcmVzZW5jZSAoY2hkPTEpIG9yIGFic2VuY2UgKGNoZD0wKSBvZiBjb3JvbmFyeSBoZWFydCBkaXNlYXNlLiAKIyBUaGVyZSBhcmUgOSBjb3ZhcmlhdGVzOgojIHNicDogIHN5c3RvbGljIGJsb29kIHByZXNzdXJlIGN1bXVsYXRpdmUgdG9iYWNjbyAoa2cpCiMgdG9iYWNjbzogIGN1bXVsYXRpdmUgdG9iYWNjbyAoa2cpCiMgbGRsOiAgbG93IGRlbnNpaXR5IGxpcG9wcm90ZWluIGNob2xlc3Rlcm9sCiMgYWRpcG9zaXR5CiMgZmFtaGlzdDogIGZhbWlseSBoaXN0b3J5IG9mIGhlYXJ0IGRpc2Vhc2UgKFByZXNlbnQsIEFic2VudCkKIyB0eXBlYTogIHR5cGXigJBBIGJlaGF2aW9yCiMgb2Jlc2l0eQojIGFsY29ob2w6ICBjdXJyZW50IGFsY29ob2wgY29uc3VtcHRpb24KIyBhZ2U6ICBhZ2UgYXQgb25zZXQKCkhlYXJ0ID0gcmVhZC50YWJsZSgnaHR0cHM6Ly9zdGF0d2ViLnN0YW5mb3JkLmVkdS9+dGlicy9FbGVtU3RhdExlYXJuL2RhdGFzZXRzL1NBaGVhcnQuZGF0YScsCiAgICAgICAgICAgICAgICAgIHNlcD0iLCIsaGVhZD1ULHJvdy5uYW1lcz0xKQphdHRhY2goSGVhcnQpCmhlYWQoSGVhcnQpCkhlYXJ0RnVsbCA9IGdsbShjaGR+LiwgZGF0YT1IZWFydCwgZmFtaWx5PWJpbm9taWFsKQpzdW1tYXJ5KEhlYXJ0RnVsbCkKYGBgCgpgYGB7ciwgY2FjaGU9RkFMU0V9CkhlYXJ0U3RlcCA9IHN0ZXAoSGVhcnRGdWxsLCBzY29wZT1saXN0KHVwcGVyPX4uLCBsb3dlcj1+MSksIGs9MikKc3VtbWFyeShIZWFydFN0ZXApCmBgYApgYGB7cn0KcGhhdD1IZWFydFN0ZXAkZml0dGVkLnZhbHVlcwpteXByZWQgPSAocGhhdD4wLjUpCnRhYmxlKEhlYXJ0JGNoZCwgbXlwcmVkKQplcnJvcj0oNDYrNzMpL25yb3coSGVhcnQpCmVycm9yCmBgYAoKCgoKCg==