Every source I have read says that there is no closed form solution for the beta coefficient but I have not seen an explanation as to why. I tried to solve for the beta coefficient on my own to see why, and I reached a solution. I think I figured out where I am wrong, but I am not sure. In my notation, when exp() and log() is applied to a vector, assume it is applied element-wise $$ $$
Starting with the poisson log likelihood: $$ l(\beta) = \Sigma_i^m \Big( y_i \beta^T x_i - e^{\beta^T x_i} - log( y_i ! ) \Big) $$ taking the gradient $$ \frac{\partial}{\partial \beta}l(\beta) = \Sigma_i \Big( y_i x_i - e^{\beta^T x_i} x_i \Big)$$ which can be written in matrix form as $$ \frac{\partial}{\partial \beta}l(\beta) = X^TY - X^T e^{X \beta } $$ setting equal to the zero vector results in $$ X^T Y = X^T e^{X \beta } $$ $$ (X X^T)^{-1} X X^T Y = (X X^T)^{-1} X X^T e^{X \beta } \leftarrow ( \text{mistake here}) $$ $$ Y = e^{X \beta} $$ $$ log (Y) = X \beta $$ $$ ( X^T X)^{-1} X^T log (Y) = ( X^T X)^{-1} X^T X \beta $$ $$ (X^T X)^{-1} X^T log(Y) = \beta $$
I've marked my mistake with "$ \leftarrow (*) $" as the column vectors of $ X^T $ are not linearly independent and $X X^T $ is not invertible. The column vectors are not linearly independent because there are more columns than there are rows in $X^T $. I am aware that the result $ Y = e^{XB} $ does not make sense.