Uniqueness of MLE for Poisson Regression

292 Views Asked by Bumbble Comm At 31 Mar 2026 - 10:36

The maximum likelihood-based parameter estimation for the Poisson Regression is shown on various occasions, but I could not find the fact if the MLE that one can find via Gradient descend or newton-Raphson method is unique or not?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 10 Jan 2021 - 7:54 BEST ANSWER

It is indeed unique, because the negative log-likelihood is strictly convex. I will show that here.

On a probability space $(\Omega, \mathscr{F}, P)$ let $y_i : \Omega \to \mathbb{N}$ be $n$ independent Poisson random-variables given some unknown parameters $\theta \in \mathbb{R}^m$ and corresponding "feature" vectors $b_i \in \mathbb{R}^m$. Assume that $m$ of the $n$ feature vectors are linearly-independent (this is almost always the case in practice as $n \gg m$).

The likelihood model for basic Poisson regression is: \begin{align} p\big{(}y_1,y_2,\ldots,y_n|\theta\big{)} &= \prod_{i=1}^n \text{Pois}\big{(}y_i;\lambda_i=e^{b_i^\intercal \theta}\big{)}\\[3pt] &= \prod_{i=1}^n \frac{\big{(}e^{b_i^\intercal \theta}\big{)}^{y_i} e^{-\big{(}e^{b_i^\intercal \theta}\big{)}}}{y_i!} \end{align}

The maximizing $\theta$ must be a stationary point: \begin{align} \frac{d}{d\theta}\log p\big{(}y_1,y_2,\ldots,y_n|\theta\big{)} &= 0\\[3pt] \frac{d}{d\theta} \sum_{i=1}^n \log e^{y_i b_i^\intercal \theta} + \log e^{-e^{b_i^\intercal \theta}} - \log y_i! &= 0\\[3pt] \sum_{i=1}^n \frac{d}{d\theta} y_i b_i^\intercal \theta - \frac{d}{d\theta} e^{b_i^\intercal \theta} - \frac{d}{d\theta} \log y_i! &= 0\\[3pt] \sum_{i=1}^n y_i b_i^\intercal - e^{b_i^\intercal \theta}b_i^\intercal &= 0\\[3pt] \sum_{i=1}^n \big{(}y_i - e^{b_i^\intercal \theta}\big{)}b_i &= 0\\[3pt] \end{align}

For a method like gradient ascent or Newton's to maximize uniquely, we must have strict concavity: \begin{align} \frac{d^2}{d\theta^2}\log p\big{(}y_1,y_2,\ldots,y_n|\theta\big{)} &\overset{?}{<} 0\\[3pt] \frac{d}{d\theta} \sum_{i=1}^n y_i b_i^\intercal - e^{b_i^\intercal \theta}b_i^\intercal &\overset{?}{<} 0\\[3pt] - \sum_{i=1}^n e^{b_i^\intercal \theta}b_i b_i^\intercal &< 0\\[3pt] \end{align}

We do, because the real exponential is positive-definite, $b_ib_i^\intercal$ is positive-semidefinite, and the sum contains $m$ linearly-independent rank-1 terms. Note: even if that last condition isn't met, we still have concavity ($\small{\implies}$ global maximum), just not strict concavity ($\small{\implies}$ unique global maximum).

Uniqueness of MLE for Poisson Regression

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in REGRESSION

Related Questions in MAXIMUM-LIKELIHOOD

Trending Questions

Popular # Hahtags

Popular Questions