Maximum Likelihood for $\beta_1$ Using Partial Derivatives

122 Views Asked by At

Assume $y_i = \beta_0 + \beta_1 x_i + e_i$ and $e_i\sim N(0,\sigma^2)$. One way to estimate $\mathbf{\beta}$ is via maximum likelihood estimate. I saw from elsewhere that $\mathbf{\beta}$ could also be estimated using $$\hat{\mathbf{\beta}}_{\text{ML}} = \beta_0 - \left( \frac{\partial^2 L} {\partial \mathbf{\beta^2}}\right)^{-1} \frac{\partial L}{\partial \mathbf{\beta}},$$ where ML is maximum likelihood and $L$ is the likelihood function. How valid is this? Could anyone give insightful explanation and probably some references on it?

1

There are 1 best solutions below

3
On

First, it's worth pointing out that there are two different meanings for the term $\beta_0$ in the above. There's the coefficients vector $\beta = (\beta_0,\beta_1)$; there's also the initial point $\beta_{0} = (\beta_{0,0},\beta_{0,1})$ in an iterative search for that vector. To avoid confusion, I'll refer to the latter as $\beta_{init}$.

The formula $$\hat{\mathbf{\beta}} = \beta_{init} - \left( \frac{\partial^2 L} {\partial \mathbf{\beta^2}}\right)^{-1} \frac{\partial L}{\partial \mathbf{\beta}}$$ does not in general give the MLE. However, the same formula applied to the log-likelihood $l$, $$\hat{\mathbf{\beta}}_{ML} = \beta_{init} - \left( \frac{\partial^2 l} {\partial \mathbf{\beta^2}}\right)^{-1} \frac{\partial l}{\partial \mathbf{\beta}}$$ reduces to the standard expression $(X^TX)^{-1}X^TY$ for the MLE, for any value of $\beta_{init}$.

This formula can be recognised as one step of Newton's method for optimising a function. In general, Newton's method requires many steps to converge; for regression, however, the log-likelihood is quadratic in $\beta$, which means that one step of Newton's method is sufficient.