Given a set of outcomes $y_i\in\mathbb N$ and a set of corresponding intensities $\lambda_i \in \mathbb R^+$ with $i\in[1,k]$ going through the set of $k$ samples, the Poisson log-likelihood is:
$$L = \sum_{i=1}^k - \lambda_i + y_i \ln \lambda_i - \ln y_i !$$
The model is linear in the sense that the intensities are:
$$\lambda_i = \beta^T X_i$$
where $X_i \in \mathbb R^n$ are feature vectors and $\beta \in \mathbb R^n$ are the linear coefficients that we want to estimate using maximum likelihood:
$$\frac{dL}{d\beta} = \sum_{i=1}^k -X_i + \frac{y_i X_i}{\beta^T X_i} = 0$$
which leads to:
$$\sum_{i=1}^k X_i = \sum_{i=1}^k \frac{y_i X_i}{\beta^T X_i}$$
Now, this is unfortunately non-linear in $\beta$, but it doesn't seem too evil so my gut feeling is that there is a simple solution, I just can't seem to be able to figure it out. If someone would be able to point me in the right direction that would be highly appreciated.
Note: doing the same exercise with a Gaussian distribution instead of a Poisson leads to the simple linear regression which can be solved the usual way:
$$L = \sum_{i=1}^k (y_i - \mu_i )^2$$
$$\mu_i = \beta^T X_i$$
$$\frac{dL}{d\beta} = \sum_{i=1}^k (y_i - \beta^T X_i) X_i^T = 0$$
$$\beta = \left(\sum_{i=1}^k X_i X_i^T\right)^{-1} \sum_{i=1}^k X_i y_i $$
which makes me think that the Poisson case should also have a simple solution.
Thanks!