I apologize in advance for the mess you're about to read.
I have recently been introduced to estimators, and right after the method of moment estimators we have the maximum likelihood estimators. But I'm failing to really understand it. It just seems harder to compute because the likelihood function is a product of mass/density functions.
Moreover, I have a problem which poses the question "Why would it be difficult to find an MLE for $(k,p)$?" given a sample $X_{1},...,X_{n}$ taken from $\operatorname{Bin}(k,p)$ and I have no idea why. Is it because the likelihood function$$L(k,p) = \prod_{i=1}^{n}f_{X}(x_{i};k,p) = \prod_{i=1}^{n}\binom{k}{x_{i}}p^{x_i}(1-p)^{k-x_{i}}$$is hard to compute or maximise? or maybe because of the product of binomial coefficients?
Any help is well appreciated!
Given $k$, the maximum likelihood estimator for $p$ will be $\hat p_k=\frac{\frac1n\sum x_i}{k}$, and using that for $p$, the likelihood for $k$ becomes $$\prod_{i=1}^{n}\binom{k}{x_{i}}\left(\frac{\frac1n\sum x_i}{k}\right)^{x_i}\left(1-\frac{\frac1n\sum x_i}{k}\right)^{k-x_{i}} \\ = \frac{(k!)^n n^{nk}}{k^{nk} \prod x_i!(k-x_i)!}\left(\sum x_i\right)^{\sum x_i}\left(nk-{\sum x_i}\right)^{nk-\sum x_i}$$ which looks difficult to handle even dropping multiplicative terms not depending on $k$. You could perhaps do an integer search, starting at $\max(x_i)$ and working upwards.
You would not expect the result to necessarily be unbiased, though that is a relatively minor issue. More worrying might be sensitivity to the precise values of the data despite natural random variation.
But I think that is not the real difficulty. There seem to be some observation values for which the the likelihood seem to be an increasing function of $k$ and so the maximum likelihood solution would be $\hat k= \infty$.
As a small contrived example, I suspect that observing $x_1=2, x_2=0$ may lead to this effect. Less artificially, a case of $x_1=42, x_2=56, x_3=40, x_4=41, x_5=54$ seemed to lead to this problem $($generated as one of several binomial samples with $k=100, p=\frac12)$. I would guess you may have the same issue with a method of moments if the variance is not smaller than the mean of the observations.