I would appreciate help with on how to find the estimation $\hat{\beta}_{ML}$ for a Borel distribution. I am doing something wrong in, my guess, the likelihood function and therefore the final answer is also wrong. The question is from an old exam without an solution in the course statistic inference theory.
Suppose that we have an iid sample $X_{1:n} = (X_1, X_2, ..., X_n)$ from a Borel distribution with probability mass function of $X_i$ given by $$P(X_i=x:\beta)=\frac{1}{x!}(\beta x)^{x-1}e^{-\beta x}, x=1,2,..$$ Derive the maximum likelihood estimate $\hat{\beta}_{ML}$ for $\beta$.
I am trying to do this by (1) computing the likelihood function $L(\beta)$, (2) computing the log-likelihood $l(\beta)$, (3) computing the score function $S(\beta)$ and finally (4) solving the score equation in order to obtain the estimate $\hat{\beta}_{ML}$. However, I'm having a bit of a problem getting the correct likelihood and log-likelihood and as a result all the following steps are incorrect.
My solution: I started with (1) and got the following $$L(\beta)=\frac{1}{\prod_{k=1}^nx_k!}exp\left(-\beta \sum_{k=1}^n x_k\right)\prod_{k=1}^n(\beta x_k)^{x_k-1}$$ and left it like that since I found no further way to simplify $\prod_{k=1}^n(\beta x_k)^{x_k-1}$. So then I continued on to (2)and got the following $$l(\beta) = -log\left(\sum_{k=1}^n x_k\right)-\beta\sum_{k=1}^nx_k+\left(\sum_{k=1}^n x_k-1\right)log\left(\sum_{k=1}^n\beta x_k\right)$$ which I then tried to find the score function of in order to solve the score equation $S(\beta)=0$ but without success. But when I tried solving it, it looked like the following $$S(\beta)=-\sum_{k=1}^nx_k+\frac{\sum_{k=1}^nx_k-1}{\sum_{k=1}^n\beta x_k}\cdot\sum_{k=1}^nx_k=-\sum_{k=1}^nx_k + \frac{\sum_{k=1}^nx_k-1}{\beta n}$$.
However by solving $S(\beta)=0$ it does not give me the correct answer, which should be $\hat{\beta}_{ML}=1-\frac{1}{\bar{x}}$, would someone help me with what I am doing wrong? I am guessing that I should do something more with the product that I left in $L(\beta)$ - so what am I missing here? Thanks in advance!
Let $\bar x$ be the sample mean; then the kernel of the likelihood for a single observation is $$\mathcal L(\beta \mid x) \propto \beta^{x-1} e^{-\beta x} = \beta^{-1} (\beta e^{-\beta})^x.$$ Then the kernel for the likelihood of the full sample is $$\mathcal L(\beta \mid x_{1:n}) \propto \beta^{-n} (\beta e^{-\beta})^{n \bar x}.$$ Consequently the log-likelihood is $$\ell(\beta \mid x_{1:n}) = -n \log \beta + n\bar x \log (\beta e^{-\beta}) \propto -\beta \bar x + (\bar x - 1) \log \beta.$$ Computing the derivative with respect to $\beta$ yields $$\frac{\partial \ell}{\partial \beta} \propto -\bar x + \frac{\bar x - 1}{\beta},$$ hence the unique critical point is $$\hat \beta = 1 - \frac{1}{\bar x}.$$