How can we derive the Normal equations for Logistic Regression?

288 Views Asked by At

I was wondering from numerical linear algebra point of view that since solving OLS $\|b-Ax\|_{2}=\min_{w\in\mathbb{R}^{n}}\|b-Aw\|$ is equivalent to solving the normal system $$ A^{T}Ax=A^{T}b $$ then for solving a logistic regression problem, how would the normal equations look like?

Edit: If the answer is no, how can we benefit from QR and SVD to somehow solve logistic regression problems?

1

There are 1 best solutions below

0
On BEST ANSWER

In the linear response model $Y = X^\intercal \beta + \epsilon$ with $\epsilon \sim N(0, \sigma^2)$, you can show that the maximum likelihood estimator is obtained by solving the normal equations. The normal equations are nice because they require a single linear system solve. The situation is not quite as nice in logistic regression, but it's not terribly bad either. Let me explain...

In 01 classification, the log-likelihood is \begin{align*} \ell(\theta)\equiv\log\mathcal{L}(\theta) & =\log\prod_{i}\mathbb{P}(Y=y_{i}\mid X=x_{i})\\ & =\log\prod_{i}\mathbb{P}(Y=1\mid X=x_{i})^{y_{i}}\mathbb{P}(Y=0\mid X=x_{i})^{1-y_{i}}\\ & =\sum_{i}y_{i}\log\mathbb{P}(Y=1\mid X=x_{i})+\left(1-y_{i}\right)\log\mathbb{P}(Y=0\mid X=x_{i}). \end{align*} Logistic regression is a model in which the log odds are linear. This is equivalent to saying $$ \mathbb{P}(Y=1\mid X=x_{i})=\frac{1}{1+e^{-x_{i}^{\intercal}\theta}}. $$ If you plug this into the expression for $\ell(\theta)$, you obtain the log-likelihood for logistic regression. The maximum likelihood estimator $\hat{\theta}$ is the value of $\theta$ which maximizes $\ell(\theta)$ (or, equivalently, $\mathcal{L}(\theta)$). You can obtain approximate $\hat{\theta}$ using a numerical method. A simple example of such a method is gradient descent.