Uniqueness of MLE for Poisson Regression

292 Views Asked by At

The maximum likelihood-based parameter estimation for the Poisson Regression is shown on various occasions, but I could not find the fact if the MLE that one can find via Gradient descend or newton-Raphson method is unique or not?

1

There are 1 best solutions below

1
On BEST ANSWER

It is indeed unique, because the negative log-likelihood is strictly convex. I will show that here.

On a probability space $(\Omega, \mathscr{F}, P)$ let $y_i : \Omega \to \mathbb{N}$ be $n$ independent Poisson random-variables given some unknown parameters $\theta \in \mathbb{R}^m$ and corresponding "feature" vectors $b_i \in \mathbb{R}^m$. Assume that $m$ of the $n$ feature vectors are linearly-independent (this is almost always the case in practice as $n \gg m$).

The likelihood model for basic Poisson regression is: \begin{align} p\big{(}y_1,y_2,\ldots,y_n|\theta\big{)} &= \prod_{i=1}^n \text{Pois}\big{(}y_i;\lambda_i=e^{b_i^\intercal \theta}\big{)}\\[3pt] &= \prod_{i=1}^n \frac{\big{(}e^{b_i^\intercal \theta}\big{)}^{y_i} e^{-\big{(}e^{b_i^\intercal \theta}\big{)}}}{y_i!} \end{align}

The maximizing $\theta$ must be a stationary point: \begin{align} \frac{d}{d\theta}\log p\big{(}y_1,y_2,\ldots,y_n|\theta\big{)} &= 0\\[3pt] \frac{d}{d\theta} \sum_{i=1}^n \log e^{y_i b_i^\intercal \theta} + \log e^{-e^{b_i^\intercal \theta}} - \log y_i! &= 0\\[3pt] \sum_{i=1}^n \frac{d}{d\theta} y_i b_i^\intercal \theta - \frac{d}{d\theta} e^{b_i^\intercal \theta} - \frac{d}{d\theta} \log y_i! &= 0\\[3pt] \sum_{i=1}^n y_i b_i^\intercal - e^{b_i^\intercal \theta}b_i^\intercal &= 0\\[3pt] \sum_{i=1}^n \big{(}y_i - e^{b_i^\intercal \theta}\big{)}b_i &= 0\\[3pt] \end{align}

For a method like gradient ascent or Newton's to maximize uniquely, we must have strict concavity: \begin{align} \frac{d^2}{d\theta^2}\log p\big{(}y_1,y_2,\ldots,y_n|\theta\big{)} &\overset{?}{<} 0\\[3pt] \frac{d}{d\theta} \sum_{i=1}^n y_i b_i^\intercal - e^{b_i^\intercal \theta}b_i^\intercal &\overset{?}{<} 0\\[3pt] - \sum_{i=1}^n e^{b_i^\intercal \theta}b_i b_i^\intercal &< 0\\[3pt] \end{align}

We do, because the real exponential is positive-definite, $b_ib_i^\intercal$ is positive-semidefinite, and the sum contains $m$ linearly-independent rank-1 terms. Note: even if that last condition isn't met, we still have concavity ($\small{\implies}$ global maximum), just not strict concavity ($\small{\implies}$ unique global maximum).